Our site uses cookies to improve your experience. You can find out more about our use of cookies in About Cookies, including instructions on how to turn off cookies if you wish to do so. By continuing to browse this site you agree to us using cookies as described in About Cookies .

The Cochrane Library

Trusted evidence. Informed decisions. Better health.

Scolaris Search Portlet Scolaris Search Portlet

Scolaris language selector scolaris language selector.

Select your preferred language for Cochrane Reviews and other content. Sections without translation will be in English.

Select your preferred language for the Cochrane Library website.

Scolaris Content Language Banner Portlet Scolaris Content Language Banner Portlet

Web content display web content display, about cochrane reviews, what is a systematic review.

A systematic review attempts to identify, appraise and synthesize all the empirical evidence that meets pre-specified eligibility criteria to answer a specific research question. Researchers conducting systematic reviews use explicit, systematic methods that are selected with a view aimed at minimizing bias, to produce more reliable findings to inform decision making.

What is a Cochrane Review?

A Cochrane Review is a systematic review of research in health care and health policy that is published in the Cochrane Database of Systematic Reviews .

Types of Cochrane Review

Intervention reviews  assess the effectiveness/safety of a treatment, vaccine, device, preventative measure, procedure or policy.

Diagnostic test accuracy reviews  assess the accuracy of a test, device or scale to aid diagnosis.

Prognosis reviews  describe and predict the course of individuals with a disease or health condition.

Qualitative evidence syntheses  investigate perspectives and experiences of an intervention or health condition.

Methodology reviews explore or validate how research is designed, conducted, reported or used.

Overviews of reviews synthesize information from multiple systematic reviews on related research questions.

Rapid reviews are systematic reviews accelerated through streamlining or omitting specific methods.

Prototype reviews include other types of systematic review that do not yet have established standard methodology in Cochrane, such as scoping reviews, mixed-methods reviews, reviews of prevalence studies, and realist reviews.

Cochrane Review methods

Cochrane Reviews base their findings on the results of studies that meet certain quality criteria, since the most reliable studies will provide the best evidence for making decisions about health care. Authors of Cochrane Reviews apply methods which reduce the impact of bias across different parts of the review process, including:

Identification of relevant studies from a number of different sources (including unpublished sources);

Selection of studies for inclusion and evaluation of their strengths and limitations on the basis of clear, predefined criteria;

Systematic collection of data;

Appropriate synthesis of data.

These methods are described in detail in the Cochrane Handbook for Systematic Reviews of Interventions and the Cochrane Handbook for Diagnostic Test Accuracy Reviews .

Cochrane Reviews are updated to reflect the findings of new evidence when it becomes available because the results of new studies can change the conclusions of a review. Cochrane Reviews are therefore valuable sources of information for those receiving and providing care, as well as for decision-makers and researchers.

What is a meta-analysis?

If the results of the individual studies are combined to produce an overall statistic, this is usually called a meta-analysis. Many Cochrane Reviews measure benefits and harms by collecting data from more than one trial, and combining them to generate an average result. This aims to provide a more precise estimate of the effects of an intervention and to reduce uncertainty.

Not every review in the Cochrane Database of Systematic Reviews contains a meta-analysis. This might not be appropriate if the designs of the studies are too different, if the outcomes measured are not sufficiently similar, or if there are concerns about the quality of the studies, for an average result across the studies to be meaningful.

What is a summary of findings table?

A summary of findings table presents the main findings of a review in a transparent and simple tabular format. In particular, the tables provide key information concerning the quality of evidence, the magnitude of effect of the interventions examined, and the sum of available data on the main outcomes. Most reviews would be expected to have a single summary of findings table. Other reviews may include more than one, for example if the review addresses more than one major comparison, or substantially different populations.

What is a protocol?

All research should be carried out according to a pre-defined plan. Cochrane researchers use the protocol to describe the proposed approach for a systematic review. It outlines the question that the review authors are addressing, detailing the criteria against which studies will be assessed for inclusion in the review, and describing how the authors will manage the review process. Protocols contain information that defines the health problem and the intervention under investigation, how benefits and harms will be measured, and the type of appropriate study design. The protocol also outlines the process for identifying, assessing, and summarizing studies in the review. By making this information available the protocol is a public record of how the review authors intend to answer their research question.

cochrane systematic literature review

Sign In Sign In

Scolaris content display scolaris content display.

Search for your institution's name below to login via Shibboleth

Previously accessed institutions

If you have a Wiley Online Library institutional username and password, enter them here.

1.2.2  What is a systematic review?

A systematic review attempts to collate all empirical evidence that fits pre-specified eligibility criteria in order to answer a specific research question.  It  uses explicit, systematic methods that are selected with a view to minimizing bias, thus providing more reliable findings from which conclusions can be drawn and decisions made (Antman 1992, Oxman 1993) . The key characteristics of a systematic review are:

a clearly stated set of objectives with pre-defined eligibility criteria for studies;

an explicit, reproducible methodology;

a systematic search that attempts to identify all studies that would meet the eligibility criteria;

an assessment of the validity of the findings of the included studies, for example through the assessment of risk of bias; and

a systematic presentation, and synthesis, of the characteristics and findings of the included studies.

Many systematic reviews contain meta-analyses. Meta-analysis is the use of statistical methods to summarize the results of independent studies (Glass 1976). By combining information from all relevant studies, meta-analyses can provide more precise estimates of the effects of health care than those derived from the individual studies included within a review (see Chapter 9, Section 9.1.3 ). They also facilitate investigations of the consistency of evidence across studies, and the exploration of differences across studies.

Cochrane Consumer Network

Cochrane and systematic reviews, about the cochrane library, systematic reviews.

  • How do I know an intervention works
  • What consumers can and cannot get from a review
  • Levels of evidence
  • Cochrane groups

Picture

  Top of page       

The Cochrane Library is an electronic collection of databases published on the internet and also available on CD-Rom. It is updated quarterly in an effort to add to and keep the information current. The Library is made up of a number of parts.

The Cochrane Database of Systematic Reviews (CDSR) contains the published Cochrane reviews and protocols.

The Cochrane Central Register of Controlled Trials (CENTRAL) collates references to controlled trials in health care. These healthcare trial references are entered by Cochrane groups. The main way of finding health care studies is by looking in electronic databases (such as MEDLINE, EMBASE, CINAHL) using special search terms. Other ways are by asking experts in a particular health field and through hand searching journals.

The Database of Abstracts of Reviews of Effects (DARE) is a collection of structured abstracts and bibliographic references of systematic reviews of the effects of health care. It is developed by the Centre for Research and Dissemination, University of York, UK.

Methodological reviews and articles are also presented in The Cochrane Library.

In addition, each Cochrane group (termed an entity) has a section (module) in the Library that gives information on the group’s organisation, contact details, function, reviews, and other general information.

Accessing The Cochrane Library

Abstracts of reviews are readily accessible at www.cochrane.org/reviews . In countries such as Australia, Denmark, Finland, Ireland, Latin America, Norway and UK the full reviews are freely available as the governments of these countries have subscriptions to The Cochrane Library . Consumers who live in other countries and who wish to read a full review may need to access The Cochrane Library through a university, hospital or large public library.

A Cochrane Library Users Guide is available ( https://www.cochrane.org.au/libraryguide/ ) to help you find the information you want from The Cochrane Library.

Brief summaries (plain language summaries) of Cochrane reviews are written for consumers and others to highlight the information in a review. A What’s New Digest summarises the newest reviews.

If you would like to make comments on any existing review in The Cochrane Library, you will find a special section for 'Comments and Criticisms' with the review.

If someone decides to look critically at articles that have appeared in the medical or health literature on a particular topic they are said to be ‘reviewing the literature’. The authors may review, say, all the drug treatments available for one type of heart disease. A review is very clearly defined and sets out to find what evidence there is for prescribing one particular intervention or drug in a specific health condition, often in a certain group of people.

Examples of review topics are: Single dose celecoxib for acute postoperative pain; Artichoke leaf extract for treating hypercholesterolaemia; Chocolate avoidance for preventing migraine; Etidronate for treating and preventing postmenopausal osteoporosis.

What is a systematic review?

A systematic review summarises the results of available carefully designed healthcare studies (controlled trials) and provides a high level of evidence on the effectiveness of healthcare interventions.

The review authors set about their task very methodically following, step by step, an advance plan that covers:

  • the way existing studies are found;
  • how the relevant studies are judged in terms of their usefulness in answering the review question;
  • how the results of the separate studies are brought together to give an overall measure of effectiveness (benefits and harms) – statistical techniques used to combine the results are called meta-analysis.

What is a protocol?

A protocol is the plan or set of steps to be followed in preparing a review. A protocol for a systematic review clearly describes why the review is needed (the review question), what the review is about (the healthcare context of the review), and how the reviewer authors will go about developing the review. It details how they will seek, select as relevant, critically appraise studies, and collect and analyse data (combine data and check for significance to the healthcare situation) from the included studies.

Cochrane protocols are published in the Cochrane Database of Systematic Reviews so that people can comment on them before the actual review has been carried out.

How do I know a healthcare intervention works?

The aim of a systematic review is to thoroughly assess, by means of a set procedure, the best possible evidence about the effects of a healthcare intervention or treatment in a particular healthcare situation.

Healthcare studies are generally designed to assess the benefits, rather than the harms, of an intervention. Studies generally have a relatively short designated time period. Any possible harms of an intervention may be expected to occur less frequently and over a longer period of time than the studies cover.

The process of a review is clearly defined, before starting the actual review of the literature, to minimise associations of expectations of effects and other sources of bias. Bias is a systematic ‘error’ or mistake in the judgments and decisions made that influence the results of a study or a review. Bias differs from a ‘placebo effect’, which is where participants of a study (or assessors of the outcomes) perceive a beneficial effect, or harm, with an inactive treatment.

Synthesising evidence

The specific methods used in a review are carefully set out by The Cochrane Collaboration and are described in each review.

A Cochrane review is prepared and maintained using specific methodologies described in the Cochrane Handbook .

Systematic reviews of randomised controlled trials provide the clearest evidence for the benefits of a healthcare intervention.

This is because the best way to assess the effects of a health care treatment is to use procedures that reduce the influence of chance effects and associations of cause and effect. Individual expectations on the part of a service provider, assessor and the person receiving an intervention can all contribute to modifying observed findings from a healthcare study. Randomised controlled trials where none of these people know the exact intervention a study participant is receiving (intervention under investigation, a placebo, or a comparator) may be expected to provide the best evidence.

Comparing groups can be misleading

By assessing the health of the two comparative groups in a study after their treatments, we can tell which intervention is more successful – but only if the two groups of people were very similar before treatment began. Otherwise we might be misled. For instance, one group may become healthier not because their treatment was better but because they were younger, not so ill, at less risk of ill health before treatment began, or even self selected to a particular intervention because of a particular personality trait, for example, people who chose to take a hormone may have wanted to stay younger and be more active.

Randomised controlled trials

Randomised controlled trials are studies that are rigorously designed. People are allocated to intervention groups in a way that minimises the chances of predicting which treatment group a study participant is in. The intervention under investigation is compared against a well-known intervention or an inactive treatment (placebo). Studies are controlled so that participants have similar associated care in all ways other than the intervention. Ideally, depending on the type of intervention, the service provider is unaware of which group a participant is in and those assessing outcomes are also unaware – this is termed ’ blinding ’.

The strength of evidence for a particular intervention can be increased further by systematically looking at (reviewing) all available randomized controlled trials that have been reported relevant to a particular healthcare situation.

It is important to search thoroughly for all studies

Many people are needed to properly test an intervention. This is more than can be recruited into a single trial; it is also important to investigate the intervention in different populations. Furthermore, the technical aspects of a particular randomised controlled trial may nothave been implemented properly, for one reason or another. The effects of these shortcomings can be minimised by grouping results of a number of studies.

The results of randomised controlled trials may be published in any one of thousands of journals world wide. Indeed some studies are not published at all. In reality the studies found most easily tend to have over-optimistic results and finding reliable information about the effects of care is particularly difficult when there are negative results (the intervention is no better than placebo or another treatment). Sometimes published trials are too small to provide a conclusive result in their own right - as to whether a treatment really does work. Consequently, to find out about a healthcare intervention it is worth searching research literature thoroughly to see if the answer is already known. This may require considerable work over many months, but it will be much less work than conducting a new randomised controlled trial. This process also will not unnecessarily exclude people from effective interventions because of allocation to a placebo (or inactive treatment) group.Discussions are underway in The Cochrane Collaboration as to how qualitative studies can be used to add to the information obtained from controlled studies - those that consider outcomes measured in numerical terms (and so are termed quantitative studies). Qualitative measures include ‘quality of life’ and lifestyle changes obtained from detailed questionnaires. Qualitative studies may also use narrative interviews where participants are asked to talk about their experiences around sets of semi-structured questions and prompts to explore particular issues that information is needed on for a study.

Top of page

What consumers can, and cannot, get from systematic reviews

Systematic reviews ask a very specific research question about a particular intervention in a clearly defined group of people who have a clear health condition or problem. Reviews provide powerful information on the state of knowledge about a healthcare intervention and whether that intervention is an effective treatment of a healthcare condition.Reviews:

  • cannot offer a guideline for treatment, especially if a person differs from those defined in the review. Individuals may have accompanying health problems, be in a different healthcare setting, or receive more than one intervention, for example;
  • follow stringent guidelines as to what types of studies are included and how healthcare measures of effectiveness can be expressed and combined;
  • may consider outcomes other than the one you are interested in and may not look at long term effects of an intervention;
  • may only find studies that are limited in the healthcare setting in which they take place;
  • may provide conclusions that are limited because of the question asked and/or the studies that were found.

Reviews are dependent on the availability of studies and the information these studies sought or obtained.

Healthcare studies differ dramatically in what they look for and how well they are carried out and, therefore, how much weight one canput on their conclusions. Part of the reason for performing systematic reviews is to reduce the effects of these shortcomings. Issues of conflict of interest and corporate funding of healthcare studies are also important considerations in drawing conclusions from any study.

Reviews are better suited to assess benefits rather than harms.

Well-designed healthcare studies generally set out to determine the efficacy of a healthcare intervention. Information on potential harms is less well investigated.

Carefully controlled studies take place over a limited period of time so that the researchers can account for all people who entered the study from beginning to the end of the study. Harms are generally less common than benefits and may be apparent over a different time period. This may be, for example, only in the long term so that the intervention would have to be given to more people for a long time period for adverse effects to be studied effectively.

Participants of studies are selected to reduce the risk of other problems interfering with the efficacy of an intervention. How selective thisprocess is needs to be carefully considered when assessing the relevance of a study to an individual.

Randomised controlled trials are expensive to run. They are very time consuming and multiple factors may limit how many participants are involved, the outcomes measured and the length of the trial. How many people complete the study is also very important.

Levels of evidence for healthcare interventions

The National Health and Medical Research Council of Australia (1999) defines the ‘dimensions of evidence’ using three main areas.

1. Strength of the evidence

Level of evidence: the study design used – a systematic review of all relevant randomised controlled trials is the highest level, followed by at least one randomised controlled trial, then a pseudo-randomised trial Quality of evidence: the methods used to minimise bias within a study design Statistical precision: the degree of certainty about the existence of a true effect

2. Size of effect

How much the determined intervention effect is above a ‘no apparent effect’ value for clinically relevant effects

3. Relevance of the evidence

How appropriate the outcome measure is for the healthcare problem, and its usefulness in measuring effectiveness of treatment

Using a measure of the variability of results – confidence intervals

Adapted from AD.Oxman Checklists for review articles. BMJ 1994;309:648-51

Level I. For a randomised controlled trial, the lower limit of the confidence interval (expressed as a range) for a measure of effect is still above a meaningful benefit in healthcare terms

Level II. For a randomised controlled trial, the lower limit of the confidence interval (expressed as a range) for a measure of effect is less than a meaningful beneficial effect in healthcare terms; but the point estimate of effect still shows effectiveness of the intervention

Lower levels of evidence

Level III. Measures of effectiveness are taken from non-randomised studies of groups of people where a control group has run concurrently with the group receiving the intervention being assessed

Level IV. Measures of effectiveness are taken from non-randomised studies of groups of people where intervention effects are compared with previous or historical information

Level V. Evidence is from single case studies

Confidence interval (CI):

Even studies perfectly designed and carried out may show variable results because of the play of chance. CI covers the likely range of the true effect. For example, the result of a study may be that 40 per cent (95% CI 30% to 50%) of people are helped by a treatment. That means that we can be 95 per cent certain the true effect is between 30 and 50 per cent. ( Smart Health Choices How to make informed health decisions by Judy Irwig, Les Irwig and Melissa Sweet, Allen and Unwin 1999)

Top of page       

Cochrane Groups

Cochrane review groups.

Different groups exist for different health conditions: International Cochrane review groups cover important areas of health care diseases and conditions. Review groups are responsible for producing and maintaining Cochrane reviews on specific health care questions. You will see in The Cochrane Library, for example, a Cochrane Consumers and Communication Group, Cochrane Epilepsy Group, Cochrane Heart Group and a Cochrane Pregnancy and Childbirth Group.

The activities of each group (or entity in Cochrane language) are monitored and co-ordinated by one person for each group, known as the managing editor (review group co-ordinator). This person manages the day to day running of the group and is usually the contact person. The co-ordinating editor leads the group and is responsible for the quality and subject of reviews.

Each group attracts members with a variety of backgrounds, experience and expertise, who contribute to the process of developing systematic reviews. They may be doctors, nurses, researchers, health advisers, consumers and caregivers.

Cochrane Fields

Fields cover health care in a broader sense than do review groups. These may include a major section of health care such as cancer, the setting of care (e.g. primary care), the type of patient/consumer (e.g. older persons), the type of provider (e.g. nurses), or the type of intervention (e.g. vaccines). The role of fields is to facilitate the work of collaborative review groups and to ensure that Cochrane reviews appropriate to an area of interest are both relevant and accessible to service providers and consumers.

Each field works to:

  • identify relevant healthcare trials and make them accessible in a specialised register;
  • ensure the proper representation of its specialist area of health care in review groups;
  • act as a liaison point between the entities within The Cochrane Collaboration and the specialist area of health care;
  • promote the accessibility of Cochrane reviews.

The principal contact person in a field is its field co-ordinator.

Cochrane Centres

Cochrane centres provide a range of services designed to support collaborative review groups in their area and to facilitate the review process. They serve as a regional source of information about The Cochrane Collaboration, provide support to Cochrane contributors within a defined geographical area and promote access to The Cochrane Library. Each centre has a director.

The Cochrane Consumer Network (CCNet)

The Consumer Network supports consumer participation within The Cochrane Collaboration, internationally. The Network is available to any active consumer. Its mission is to enable and support consumers in contributing to the function of collaborative reviews groups and other Cochrane entities. The Network enables communication with other consumers, provides a sense of belonging within The Cochrane Collaboration, links and dissemination of information from Cochrane reviews.

© The Cochrane Collaboration Comments about this page to: [email protected]

Home

Cochrane Community

Living systematic reviews.


has been released.

What is a living systematic review?

We define an LSR as a systematic review which is continually updated, incorporating relevant new evidence as it becomes available . 

Practically, this means that LSRs:

  • Are underpinned by continual, active monitoring of the evidence (i.e. monthly searches)
  • Immediately include any new important evidence (meaning data, studies or information) that is identified
  • Are supported by up-to-date communication about the status of the review, and any new evidence being incorporated

While core review methods are not fundamentally different to other Cochrane Reviews, an LSR should additionally include explicit, transparent and predefined decisions on:

  • How frequently new evidence is sought and screened
  • When and how new evidence is incorporated into the review

Why living systematic reviews?

Living systematic reviews (LSRs) provide a new approach to support the ongoing efforts of Cochrane and others to produce evidence that is both trustworthy and current. 

The concept of living evidence synthesis and related outputs, such as living guidelines, are of increasing interest to evidence producers, decision makers, guideline developers, funders and publishers, as a way to seamlessly connect evidence and practice.

The possibility of a scaled-up living evidence approach has only recently been within reach, due to a number of technological and data-related innovations, such as online platforms, linked data, and machine learning. Concurrently, research groups are embracing larger collaborations, open and shared data, and the growth of the citizen science movement, opening up the possibility of communities with a common interest maintaining high value datasets and associated LSR. 

Cochrane LSR planning and methods support

Cochrane authors and review groups planning to undertake a living systematic review are encouraged to contact the Cochrane LSR team at [email protected]  for support with developing their protocol. We're here to help you work through the factors to consider when planning and conducting an LSR.

Living systematic review pilots

Project Transform evaluated LSRs as part of the Production Models component with the following groups:

  • Cochrane Gynaecological, Neuro-oncology and Orphan Cancers
  • Cochrane Heart
  • Cochrane Acute Respiratory Infections

An evaluation was undertaken alongside the pilots to understand the feasibility of LSRs, implications for the people and processes involved and identify opportunities to refine the LSR model before scaling up. The final evaluation report is available to download here .

Following completion of the pilot, additional Cochrane teams are conducting LSRs, with the aim of at least one published LSR for each Network.

LSR guidance for production and publication

The purpose of this document is to outline the methods related to production and publication for LSRs published on The Cochrane Library. This document is primarily designed to be a practical guidance document for the authors, Review Groups and Central Editorial Unit staff involved in Cochrane LSRs. The approach described is the revised guidance based on our review of current literature relevant to LSRs and consultation with a range of stakeholders, including the LSR Network, which includes members within Cochrane and beyond.

LSR Webinars

  • Tech enablers for living evidence - Covidence & MAGICapp [slides] and video (18 September 2019)
  • Publishing living evidence [slides] and video (3 July 2019)
  • Tech enablers for living evidence – Screen4Me, RCT-classifier, RevMan Replicant, Systematic Review Accelerator (20 May 2019)
  • Living Network Meta-Analysis [slides] and video  (21 March 2019)
  • Getting better all the time: Considerations and approaches for LSR searching  [slides] and video (July 2018)
  • Practical experiences of piloting an LSR  (May 2018)
  • Living guideline recommendations: key challenges, questions and progress  (April 2018)
  • Introducing living systematic reviews  (March 2017)

Other LSR Resources

  • Presentations from Living Evidence Network meeting , Cochrane Colloquium, Edinburgh (15 September 2018)
  • Presentation from HTAi 2018 Advanced Information Retrieval on the Edge: LSRs (2 June 2018)
  • Presentations from Special Session: From living systematic reviews to living recommendations , Global Evidence Summit, Cape Town (14 Sep 2017)
  • Presentations from LSR Network meeting , Global Evidence Summit, Cape Town (12 Sep 2017)
  • Report from Cochrane Canada Symposium, LSR Workshop (13-14 May 2017)
  • Presentations from Cochrane Canada Symposium, LSR Workshop (13-14 May 2017)
  • Living systematic review methods symposium slides , Cochrane Colloquium, Seoul (26 October 2016)
  • Living Systematic Reviews: towards real-time evidence for healthcare decision making (12 May 2016)
  • Publically available Mendeley Library with relevant references
  • Useful LSR references and resources
  • Expert Searching: Living Evidence and Living Systematic Reviews: What You Need to Know (17 December 2018)
  • Living proof of living evidence (25 June 2018)
  • From living systematic reviews to living recommendations (25 June 2018)
  • We've done it again! Fruit & veg LSR re-published in record time (22 May 2018)
  • Living systematic review re-published in record time (14 February 2018)
  • Living systematic review series published in Journal of Clinical Epidemiology (12 September 2017)
  • First two living systematic reviews now live on Cochrane Library! (8 September 2017)
  • LSRs: On the road with Annie and Julian (3 July 2017)
  • Living systematic reviews are going live (28 June 2017)

Living Evidence Network

The Living Evidence Network is an informal network that was launched in February 2016, with members including Cochrane and non-Cochrane researchers, policymakers and guideline developers.

The Living Evidence Network aims to: 

  • Share experiences, information and resources
  • Further the thinking on the living evidence concept and methods
  • Develop and refine approaches for Cochrane LSRs

You can download the Living Evidence Network governance structure document here . If you would like to join the Living Evidence Network, please email  [email protected]

LSR icon

Cochrane Switzerland

Systematic reviews.

Cochrane systematic reviews provide reliable, evidence-based information on health issues.

A systematic review is the result of a rigorous scientific process consisting of several well-defined steps, including a systematic literature search, an evaluation of the quality of each included study and a synthesis, quantified or narrative, of the results obtained. The findings summarize the evidence on the efficacy of a treatment, the risk of adverse events or the accuracy of a diagnostic test, for example. However, sometimes the authors have to acknowledge that there is a lack of rigorous scientific studies.

1. What information can be found in a Cochrane systematic review?

2. How is a systematic review produced?

3. What are the main challenges encountered when producing systematic reviews?

4. Who prepares systematic reviews at Cochrane?

5. Where can I find Cochrane systematic reviews?

6. What is the difference between Cochrane systematic reviews and other systematic reviews?

7. Systematic reviews of other study types

8. What about observational studies?

All Cochrane Systematic Reviews answer a well-defined health question, such as the efficacy and safety of a surgical procedure or drug therapy, by considering all studies conducted on this question over time that meet established quality criteria. See also " How is a systematic review produced? "

The full text is a document, often of several dozen pages, divided into several parts, the main ones being:

  • A scientific abstract and a plain language summary, the latter very often translated into several languages;
  • The main sections of the review (background, objectives, methods, results, discussion, conclusion);
  • Tables describing the included studies (in detail), the excluded studies (with reason for exclusion) and the results of meta-analyses (if applicable);

Graphs, especially forest plots.

The elaboration of a systematic review is a rigorous scientific process consisting of several steps:

  • Clearly define the question to be addressed; 
  • Search and identify all relevant references of clinical trials or other appropriate studies, published or unpublished, that aim to answer the review’s question; 
  • Assess the quality of each study using standardized tools;
  • Extract and organize relevant data from the included publications and other sources of information;
  • Prepare an appropriate synthesis of the extracted results. If the data permit, perform a statistical analysis called a meta-analysis, which is used to combine quantified findings from several studies into a single pooled estimate.

One of Cochrane's principles is to avoid redundancy as much as possible. To do this, the Cochrane Review Group in charge verifies that no other Cochrane review with the same question exists yet. If that’s not the case and other criteria are met, the title of the new review can be registered. For more information see the page " Become an author " on cochrane.org.

  • Not all relevant studies answering the same specific health question are necessarily published, for example, when their results do not support the efficacy of a new treatment. Sometimes only part of the results are published, for example, only the outcomes with statistically significant differences. Cochrane supports the AllTrials initiative for the publication of all clinical trials and their full methods and results ( www.alltrials.net ).
  • Certain publications do not describe the study methods with sufficient detail to allow for critical review and evaluation. 
  • Studies are often carried out under "ideal" conditions that do not account for factors compromising the efficacy of a treatment in routine care, such as patients’ co-morbidity or non-compliance with therapy. 

The workload and time required to prepare and complete a Cochrane Systematic Review project is frequently underestimated, especially if the contributors are inexperienced.

Systematic reviews are conducted by health professionals or scientists, often with the ad hoc support of patients or patients' relatives. To prepare the review, the authors collaborate closely with the appropriate thematic review group (Cochrane Review Group), which ensures the editorial follow-up during the registration of the title, the writing of the protocol, the implementation of each step of the review and the publication in the Cochrane Library. The authors of Cochrane reviews are not funded by Cochrane but very often by public funds. Funding from the commercial sector is not accepted. 

The Cochrane Review Groups are organised in 8 networks and ensure that the rigorous quality standards that have built Cochrane’s reputation are maintained. Both, the protocol and the full Cochrane review are peer-reviewed prior to publication in the Cochrane Library .

Systematic reviews produced by Cochrane are published in the Cochrane Library (www.cochranelibrary.com/). In addition to the Cochrane reviews, this online library also includes the CENTRAL database with references to controlled studies identified in PubMed, EMBASE and through manual searches ("hand-search"), and information about Cochrane. 

Under the label "Cochrane Clinical Answers" a selection of systematic reviews of wider interest (in particular, in primary care medicine) are presented in a question-answer format with interactive tables facilitating rapid access to the results. 

In Switzerland, all content of the Cochrane Library is freely accessible through a national license.

Cochrane Systematic Reviews are all built according to the same scheme, all the steps of their conduct are well described and all the choices made during the process are outlined. This transparency in the process helps readers to understand what options were taken and why they were chosen.

In addition, Cochrane Systematic Reviews are regularly updated, which is rarely the case for systematic reviews published elsewhere. These updates are performed as needed, for example, when a significant number of new studies have been published. Updating is important to ensure that the latest clinical research is taken into account.

Cochrane Systematic Reviews address questions other than the efficacy and safety of therapy. An important area is the performance and accuracy of diagnostic tests. A diagnostic test is a test, such as a laboratory test, imaging technique or clinical examination, performed on a person with a suspected disease or condition. It is used to confirm or rule out the presence of that disease or condition and should lead to a therapeutic decision (whether and which treatment to undertake). Cochrane has set standards for the development of "diagnostic reviews" and introduced this type of review as a routine process from 2008 on. One of the methods groups ( Cochrane Screening and Diagnostic Test Methods Group ) monitors the ongoing methods development and supports the author groups conducting this type of reviews.

As time goes by, other types of reviews have been admitted. At present, these reviews are still few in number and are not part of Cochrane’s routine processes. These include reviews of prognostic studies, qualitative evidence syntheses and living systematic reviews.

To assess the efficacy of a medical intervention, the results of randomized clinical trials are central to the analysis. However, they often provide little evidence of safety, especially of serious adverse events. Typically, observational studies include larger, less selected populations with longer follow-up periods. In a systematic review of interventions, it is advisable to include the results of good quality observational studies to gain a more complete picture of the benefits (effectiveness) and risks of an intervention.

Some other relevant questions can only be answered using the results of observational studies. For example, to determine the prevalence of a disease based on estimates made in different countries, a systematic review based on data from population-based studies, e.g. cross-sectional studies, might be conducted.  

  • En español – ExME
  • Em português – EME

Traditional reviews vs. systematic reviews

Posted on 3rd February 2016 by Weyinmi Demeyin

cochrane systematic literature review

Millions of articles are published yearly (1) , making it difficult for clinicians to keep abreast of the literature. Reviews of literature are necessary in order to provide clinicians with accurate, up to date information to ensure appropriate management of their patients. Reviews usually involve summaries and synthesis of primary research findings on a particular topic of interest and can be grouped into 2 main categories; the ‘traditional’ review and the ‘systematic’ review with major differences between them.

Traditional reviews provide a broad overview of a research topic with no clear methodological approach (2) . Information is collected and interpreted unsystematically with subjective summaries of findings. Authors aim to describe and discuss the literature from a contextual or theoretical point of view. Although the reviews may be conducted by topic experts, due to preconceived ideas or conclusions, they could be subject to bias.

Systematic reviews are overviews of the literature undertaken by identifying, critically appraising and synthesising results of primary research studies using an explicit, methodological approach(3). They aim to summarise the best available evidence on a particular research topic.

The main differences between traditional reviews and systematic reviews are summarised below in terms of the following characteristics: Authors, Study protocol, Research question, Search strategy, Sources of literature, Selection criteria, Critical appraisal, Synthesis, Conclusions, Reproducibility, and Update.

Traditional reviews

  • Authors: One or more authors usually experts in the topic of interest
  • Study protocol: No study protocol
  • Research question: Broad to specific question, hypothesis not stated
  • Search strategy: No detailed search strategy, search is probably conducted using keywords
  • Sources of literature: Not usually stated and non-exhaustive, usually well-known articles. Prone to publication bias
  • Selection criteria: No specific selection criteria, usually subjective. Prone to selection bias
  • Critical appraisal: Variable evaluation of study quality or method
  • Synthesis: Often qualitative synthesis of evidence
  • Conclusions: Sometimes evidence based but can be influenced by author’s personal belief
  • Reproducibility: Findings cannot be reproduced independently as conclusions may be subjective
  • Update: Cannot be continuously updated

Systematic reviews

  • Authors: Two or more authors are involved in good quality systematic reviews, may comprise experts in the different stages of the review
  • Study protocol: Written study protocol which includes details of the methods to be used
  • Research question: Specific question which may have all or some of PICO components (Population, Intervention, Comparator, and Outcome). Hypothesis is stated
  • Search strategy: Detailed and comprehensive search strategy is developed
  • Sources of literature: List of databases, websites and other sources of included studies are listed. Both published and unpublished literature are considered
  • Selection criteria: Specific inclusion and exclusion criteria
  • Critical appraisal: Rigorous appraisal of study quality
  • Synthesis: Narrative, quantitative or qualitative synthesis
  • Conclusions: Conclusions drawn are evidence based
  • Reproducibility: Accurate documentation of method means results can be reproduced
  • Update: Systematic reviews can be periodically updated to include new evidence

Decisions and health policies about patient care should be evidence based in order to provide the best treatment for patients. Systematic reviews provide a means of systematically identifying and synthesising the evidence, making it easier for policy makers and practitioners to assess such relevant information and hopefully improve patient outcomes.

  • Fletcher RH, Fletcher SW. Evidence-Based Approach to the Medical Literature. Journal of General Internal Medicine. 1997; 12(Suppl 2):S5-S14. doi:10.1046/j.1525-1497.12.s2.1.x. Available from:  http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1497222/
  • Rother ET. Systematic literature review X narrative review. Acta paul. enferm. [Internet]. 2007 June [cited 2015 Dec 25]; 20(2): v-vi. Available from: http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0103-21002007000200001&lng=en. http://dx.doi.org/10.1590/S0103-21002007000200001
  • Khan KS, Ter Riet G, Glanville J, Sowden AJ, Kleijnen J. Undertaking systematic reviews of research on effectiveness: CRD’s guidance for carrying out or commissioning reviews. NHS Centre for Reviews and Dissemination; 2001.

' src=

Weyinmi Demeyin

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

No Comments on Traditional reviews vs. systematic reviews

' src=

THE INFORMATION IS VERY MUCH VALUABLE, A LOT IS INDEED EXPECTED IN ORDER TO MASTER SYSTEMATIC REVIEW

' src=

Thank you very much for the information here. My question is : Is it possible for me to do a systematic review which is not directed toward patients but just a specific population? To be specific can I do a systematic review on the mental health needs of students?

' src=

Hi Rosemary, I wonder whether it would be useful for you to look at Module 1 of the Cochrane Interactive Learning modules. This is a free module, open to everyone (you will just need to register for a Cochrane account if you don’t already have one). This guides you through conducting a systematic review, with a section specifically around defining your research question, which I feel will help you in understanding your question further. Head to this link for more details: https://training.cochrane.org/interactivelearning

I wonder if you have had a search on the Cochrane Library as yet, to see what Cochrane systematic reviews already exist? There is one review, titled “Psychological interventions to foster resilience in healthcare students” which may be of interest: https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.CD013684/full You can run searches on the library by the population and intervention you are interested in.

I hope these help you start in your investigations. Best wishes. Emma.

' src=

La revisión sistemática vale si hay solo un autor?

HI Alex, so sorry for the delay in replying to you. Yes, that is a very good point. I have copied a paragraph from the Cochrane Handbook, here, which does say that for a Cochrane Review, you should have more than one author.

“Cochrane Reviews should be undertaken by more than one person. In putting together a team, authors should consider the need for clinical and methodological expertise for the review, as well as the perspectives of stakeholders. Cochrane author teams are encouraged to seek and incorporate the views of users, including consumers, clinicians and those from varying regions and settings to develop protocols and reviews. Author teams for reviews relevant to particular settings (e.g. neglected tropical diseases) should involve contributors experienced in those settings”.

Thank you for the discussion point, much appreciated.

' src=

Hello, I’d like to ask you a question: what’s the difference between systematic review and systematized review? In addition, if the screening process of the review was made by only one author, is still a systematic or is a systematized review? Thanks

Hi. This article from Grant & Booth is a really good one to look at explaining different types of reviews: https://onlinelibrary.wiley.com/doi/10.1111/j.1471-1842.2009.00848.x It includes Systematic Reviews and Systematized Reviews. In answer to your second question, have a look at this Chapter from the Cochrane handbook. It covers the question about ‘Who should do a systematic review’. https://training.cochrane.org/handbook/current/chapter-01

A really relevant part of this chapter is this: “Systematic reviews should be undertaken by a team. Indeed, Cochrane will not publish a review that is proposed to be undertaken by a single person. Working as a team not only spreads the effort, but ensures that tasks such as the selection of studies for eligibility, data extraction and rating the certainty of the evidence will be performed by at least two people independently, minimizing the likelihood of errors.”

I hope this helps with the question. Best wishes. Emma.

Subscribe to our newsletter

You will receive our monthly newsletter and free access to Trip Premium.

Related Articles

""

What do trialists do about participants who are ‘lost to follow-up’?

Participants in clinical trials may exit the study prior to having their results collated; in this case, what do we do with their results?

Family therapy walking outdoors

Family Therapy approaches for Anorexia Nervosa

Is Family Therapy effective in the treatment of Anorexia Nervosa? Emily summarises a recent Cochrane Review in this blog and examines the evidence.

Blood pressure tool

Antihypertensive drugs for primary prevention – at what blood pressure do we start treatment?

In this blog, Giorgio Karam examines the evidence on antihypertensive drugs for primary prevention – when do we start treatment?

Jump to navigation

  • Bahasa Indonesia
  • Bahasa Malaysia

Home

VIDEO: What are systematic reviews?

What are systematic reviews?

A systematic review attempts to identify, appraise and synthesize all the empirical evidence that meets pre-specified eligibility criteria to answer a specific research question. Researchers conducting systematic reviews use explicit, systematic methods that are selected with a view aimed at minimizing bias, to produce more reliable findings to inform decision making. 

Here is a video from Cochrane Consumers and Communication that explains what a systematic review is clearly and simply for people who may not be familiar with the concepts and terminology of systematic reviews: what they are, how researchers prepare them, and why they’re an important part of making informed decisions about health - for everyone. 

Cochrane evidence provides a powerful tool to enhance your healthcare knowledge and decision making. This video from Cochrane Sweden explains a bit about how we create health evidence, including systematic reviews, and other activities of Cochrane. 

  • What is the difference between a Cochrane systematic review of interventions and a Cochrane diagnostic test accuracy review?
  • Learn more about Cochrane and our work
  • Join Cochrane

Banner

Systematic Reviews and Other Evidence Synthesis Types Guide

  • Systematic Review and Other Evidence Synthesis Types
  • Types of Evidence Synthesis
  • Evidence Synthesis Comparison
  • Are You Ready to Conduct an Evidence Synthesis?
  • UT Southwestern Evidence Synthesis Services
  • Task 1 - Find Articles
  • Task 2 - Formulate Question
  • Task 3 - Select Reporting Guideline
  • Task 4 - Write and Register Protocol
  • Evidence Synthesis - Search (Task 5)
  • Screen and Appraise (Tasks 6 – 11)
  • Synthesize (Tasks 12 – 15)
  • Write Up Review (Task 16)

Systematic Review or Meta-Analysis

  • Integrative Review
  • Narrative/Literature Review
  • Rapid Review
  • Scoping Review
  • Umbrella Review

Request UT Southwestern Library Evidence Synthesis/Systematic Review Services

The UT Southwestern Librarians provide two levels of Evidence Synthesis/Systematic Review (ES/SR) support.

Level 1 – Education (No Cost)

  • A librarian will provide training about the systematic review process.
  • Use the Training Request Form .

Level 2 – Librarian As ES/SR Team Member and Co-Author (Fee-Based)

  • The librarian is an active contributor.
  • UT Southwestern faculty
  • UT Southwestern residents or fellows
  • UT Southwestern Medical Center and University Hospitals clinicians
  • Begin by completing the Evidence Synthesis/Systematic Review Request Form . For more information on the fees ($1,250 per PICO or equivalent question), see the "Costs" section in the form.
  • If a Librarian joins the ES/SR Team, the ES/SR Team will complete the Evidence Synthesis/Systematic Review Library Services Agreement .
  • Contact LibAsk Schedule an appointment with UT Southwestern librarians.

cochrane systematic literature review

  • Public Health Systematic Review Guidelines
  • Electronic Books

Systematic Review – seeks to systematically search for, appraise and synthesize research evidence on a specific question, often adhering to guidelines on the conduct of a review.

Meta-analysis – a technique that statistically combines the results of quantitative studies to provide a more precise effect of the results. A good systematic review is essential to a meta-analysis of the literature.

Standards (see the Books tab) and guidelines have been developed on how to conduct and report systematic reviews and meta analyses.

Guidelines and Best Practices

  • Cochrane Handbook for Systematic Reviews of Interventions, Current Version While this Handbook focuses on systematic reviews of interventions, Cochrane publishes five main types of systematic reviews , and has developed a rigorous approach to the preparation of each of the following: ❖ Effects of interventions ❖ Diagnostic test accuracy ❖ Prognosis ❖ Reviews of reviews (umbrella reviews) ❖ Reviews of methodology Part 3 provides considerations for tackling systematic reviews from different perspectives, such as when thinking about specific populations, or complex interventions, or particular types of outcomes. It comprises the following chapters: 16. Equity 17. Intervention complexity 18. Patient-reported outcomes 19. Adverse effects 20. Economic evidence 21. Qualitative evidence
  • MECIR Manual The MECIR Standards present a guide to the conduct of new Cochrane Intervention Reviews, and the planning and conduct of updates. This online version will be kept up to date;a PDF of each section can be generated. All substantive changes will be noted here .
  • Campbell Collaboration An international social science research network that produces high quality, open and policy-relevant evidence syntheses, plain language summaries and policy briefs.

Reporting Guidelines

  • PRISMA 2020 Statement An evidence-based minimum set of items for reporting in systematic reviews and meta-analyses, PRISMA primarily focuses on the reporting of reviews evaluating the effects of interventions, but can also be used as a basis for reporting systematic reviews with objectives other than evaluating interventions (e.g. evaluating etiology, prevalence, diagnosis or prognosis). The PRISMA 2020 Statement is accompanied by the PRISMA 2020 Explanation and Elaboration paper.
  • PRISMA 2020 Checklist The 27 checklist items pertain to the content of a systematic review and meta-analysis, which include the title, abstract, methods, results, discussion and funding. Note: As a member of the ES/SR Team, the UT Southwestern Librarian completes Item 7 (Search Strategy) in the checklist.
  • PRISMA Flow Diagram The flow diagram depicts the flow of information through the different phases of a systematic review. It maps out the number of records identified, included and excluded, and the reasons for exclusions. Different templates are available depending on the type of review (new or updated) and sources used to identify studies.
  • PRISMA for Searching Published in 2021, the checklist includes 16 reporting items, each of which is detailed with exemplar reporting and rationale. The intent of PRISMA-S is to complement the PRISMA Statement and its extensions by providing a checklist that could be used by interdisciplinary authors, editors, and peer reviewers to verify that each component of a search is completely reported and therefore reproducible. For additional information, refer to the PRISMA for searching statement/exploratory paper .

Protocol Guidelines

  • PRISMA for Systematic Review Protocols (PRISMA-P) PRISMA-P, published in 2015, includes a 17-item checklist intended to facilitate the preparation and reporting of a robust protocol for the systematic review. The developers note that there are many review types outside of this scope. They recommend that due to the general lack of protocol guidance for other types of reviews, reviewers preparing any type of review protocol make use of PRISMA-P as applicable.

Protocol Registration

  • PROSPERO An international prospective register of systematic reviews. Key details from new Cochrane protocols are automatically uploaded into PROSPERO. It is produced by the Centre of Reviews and Dissemination, University of York, United Kingdom.

The Cochrane Library includes:

  • Cochrane Database of Systematic Reviews – peer-reviewed systematic reviews and protocols)
  • Cochrane Central Register of Controlled Trials (CENTRAL) – reports of randomized and quasi-randomized controlled trials
  • Cochrane Clinical Answers (CCAs) – developed to inform point-of-care decision-making each CCA contains a clinical question, a short answer, and relevant outcomes data for the clinician
  • JBI Systematic Review Register Members of the JBI Collaboration can register their review titles with JBI via completion of the online Systematic Review Title Registration Form. Once titles become registered with JBI, they are listed on the website. Titles are subsequently removed when the full protocol is publicly available, either published or posted to an accessible website.
  • Cumpston, M. S., McKenzie, J. E., Welch, V. A., & Brennan, S. E. (2022). Strengthening systematic reviews in public health: guidance in the Cochrane Handbook for Systematic Reviews of Interventions, 2nd edition. J Public Health (Oxf), 44(4), e588-e592. https://doi.org/10.1093/pubmed/fdac036
  • Jackson, N., & Waters, E. (2005). Criteria for the systematic review of health promotion and public health interventions. Health Promotion International, 20(4), 367-374. https://doi.org/10.1093/heapro/dai022
  • Thomas, B. H., Ciliska, D., Dobbins, M., & Micucci, S. (2004). A process for systematically reviewing the literature: providing the research evidence for public health nursing interventions. Worldviews on Evidence-Based Nursing, 1(3), 176-184. https://doi.org/10.1111/j.1524-475X.2004.04006.x

Cover Art

3 Should I undertake a scoping review or a systematic review? (Ask JBI) on YouTube (12:43).

Agency for Healthcare Research and Quality

  • Training Modules for the Systematic Reviews Methods Guide (Agency for Healthcare Research and Quality)

Campbell Collaboration and the Open Learning Initiative

  • Systematic Reviews and Meta-Analysis Open & Free (Carnegie Mellon University) Provides an overview of the steps involved in conducting a systematic (scientific) review of results of multiple quantitative studies.
  • Cochrane Collaboration Online Training Includes links to learning resources relevant to systematic reviews and evidence-based medicine
  • Cochrane Methodology Learning resources on key areas of Cochrane review methodology.

Joanna Briggs Institute

  • JBI SUMARI Knowledge Base

Johns Hopkins University/Coursera

  • Introduction to Systematic Review and Meta-Analysis (Johns Hopkins University)

University of North Carolina Health Sciences Library

  • Introduction to Conducting a Systematic Review Workshop (University of North Carolina Health Sciences Library) Used with permission from the Systematic Reviews LibGuide developed by the University of North Carolina Health Sciences Library.
  • Moher, D., Shamseer, L., Clarke, M., Ghersi, D., Liberati, A., Petticrew, M., … PRISMA-P Group (2015). Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Systematic reviews, 4(1), 1. https://doi.org/10.1186/2046-4053-4-1
  • Page, M. J., Moher, D., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., McGuinness, L. A., Stewart, L. A., Thomas, J., Tricco, A. C., Welch, V. A., Whiting, P., & McKenzie, J. E. (2021). PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ, 372, n160. https://doi.org/10.1136/bmj.n160
  • Rethlefsen, M. L., Kirtley, S., Waffenschmidt, S., Ayala, A. P., Moher, D., Page, M. J., Koffel, J. B., & PRISMA-S Group (2021). PRISMA-S: an extension to the PRISMA Statement for Reporting Literature Searches in Systematic Reviews. Systematic reviews, 10(1), 39. https://doi.org/10.1186/s13643-020-01542-z
  • Shamseer L, Moher D, Clarke M, Ghersi D, Liberati A, Petticrew M, Shekelle P, Stewart LA, the PRISMA-P Group. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation. BMJ 2015;349:g7647. https://doi.org/10.1136/bmj.g7647
  • << Previous: Evidence Synthesis - Resources and Guidelines
  • Next: Integrative Review >>
  • Last Updated: Sep 24, 2024 12:06 PM
  • URL: https://utsouthwestern.libguides.com/sres

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PMC10248995

Logo of sysrev

Guidance to best tools and practices for systematic reviews

Kat kolaski.

1 Departments of Orthopaedic Surgery, Pediatrics, and Neurology, Wake Forest School of Medicine, Winston-Salem, NC USA

Lynne Romeiser Logan

2 Department of Physical Medicine and Rehabilitation, SUNY Upstate Medical University, Syracuse, NY USA

John P. A. Ioannidis

3 Departments of Medicine, of Epidemiology and Population Health, of Biomedical Data Science, and of Statistics, and Meta-Research Innovation Center at Stanford (METRICS), Stanford University School of Medicine, Stanford, CA USA

Associated Data

Data continue to accumulate indicating that many systematic reviews are methodologically flawed, biased, redundant, or uninformative. Some improvements have occurred in recent years based on empirical methods research and standardization of appraisal tools; however, many authors do not routinely or consistently apply these updated methods. In addition, guideline developers, peer reviewers, and journal editors often disregard current methodological standards. Although extensively acknowledged and explored in the methodological literature, most clinicians seem unaware of these issues and may automatically accept evidence syntheses (and clinical practice guidelines based on their conclusions) as trustworthy.

A plethora of methods and tools are recommended for the development and evaluation of evidence syntheses. It is important to understand what these are intended to do (and cannot do) and how they can be utilized. Our objective is to distill this sprawling information into a format that is understandable and readily accessible to authors, peer reviewers, and editors. In doing so, we aim to promote appreciation and understanding of the demanding science of evidence synthesis among stakeholders. We focus on well-documented deficiencies in key components of evidence syntheses to elucidate the rationale for current standards. The constructs underlying the tools developed to assess reporting, risk of bias, and methodological quality of evidence syntheses are distinguished from those involved in determining overall certainty of a body of evidence. Another important distinction is made between those tools used by authors to develop their syntheses as opposed to those used to ultimately judge their work.

Exemplar methods and research practices are described, complemented by novel pragmatic strategies to improve evidence syntheses. The latter include preferred terminology and a scheme to characterize types of research evidence. We organize best practice resources in a Concise Guide that can be widely adopted and adapted for routine implementation by authors and journals. Appropriate, informed use of these is encouraged, but we caution against their superficial application and emphasize their endorsement does not substitute for in-depth methodological training. By highlighting best practices with their rationale, we hope this guidance will inspire further evolution of methods and tools that can advance the field.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13643-023-02255-9.

Part 1. The state of evidence synthesis

Evidence syntheses are commonly regarded as the foundation of evidence-based medicine (EBM). They are widely accredited for providing reliable evidence and, as such, they have significantly influenced medical research and clinical practice. Despite their uptake throughout health care and ubiquity in contemporary medical literature, some important aspects of evidence syntheses are generally overlooked or not well recognized. Evidence syntheses are mostly retrospective exercises, they often depend on weak or irreparably flawed data, and they may use tools that have acknowledged or yet unrecognized limitations. They are complicated and time-consuming undertakings prone to bias and errors. Production of a good evidence synthesis requires careful preparation and high levels of organization in order to limit potential pitfalls [ 1 ]. Many authors do not recognize the complexity of such an endeavor and the many methodological challenges they may encounter. Failure to do so is likely to result in research and resource waste.

Given their potential impact on people’s lives, it is crucial for evidence syntheses to correctly report on the current knowledge base. In order to be perceived as trustworthy, reliable demonstration of the accuracy of evidence syntheses is equally imperative [ 2 ]. Concerns about the trustworthiness of evidence syntheses are not recent developments. From the early years when EBM first began to gain traction until recent times when thousands of systematic reviews are published monthly [ 3 ] the rigor of evidence syntheses has always varied. Many systematic reviews and meta-analyses had obvious deficiencies because original methods and processes had gaps, lacked precision, and/or were not widely known. The situation has improved with empirical research concerning which methods to use and standardization of appraisal tools. However, given the geometrical increase in the number of evidence syntheses being published, a relatively larger pool of unreliable evidence syntheses is being published today.

Publication of methodological studies that critically appraise the methods used in evidence syntheses is increasing at a fast pace. This reflects the availability of tools specifically developed for this purpose [ 4 – 6 ]. Yet many clinical specialties report that alarming numbers of evidence syntheses fail on these assessments. The syntheses identified report on a broad range of common conditions including, but not limited to, cancer, [ 7 ] chronic obstructive pulmonary disease, [ 8 ] osteoporosis, [ 9 ] stroke, [ 10 ] cerebral palsy, [ 11 ] chronic low back pain, [ 12 ] refractive error, [ 13 ] major depression, [ 14 ] pain, [ 15 ] and obesity [ 16 , 17 ]. The situation is even more concerning with regard to evidence syntheses included in clinical practice guidelines (CPGs) [ 18 – 20 ]. Astonishingly, in a sample of CPGs published in 2017–18, more than half did not apply even basic systematic methods in the evidence syntheses used to inform their recommendations [ 21 ].

These reports, while not widely acknowledged, suggest there are pervasive problems not limited to evidence syntheses that evaluate specific kinds of interventions or include primary research of a particular study design (eg, randomized versus non-randomized) [ 22 ]. Similar concerns about the reliability of evidence syntheses have been expressed by proponents of EBM in highly circulated medical journals [ 23 – 26 ]. These publications have also raised awareness about redundancy, inadequate input of statistical expertise, and deficient reporting. These issues plague primary research as well; however, there is heightened concern for the impact of these deficiencies given the critical role of evidence syntheses in policy and clinical decision-making.

Methods and guidance to produce a reliable evidence synthesis

Several international consortiums of EBM experts and national health care organizations currently provide detailed guidance (Table ​ (Table1). 1 ). They draw criteria from the reporting and methodological standards of currently recommended appraisal tools, and regularly review and update their methods to reflect new information and changing needs. In addition, they endorse the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system for rating the overall quality of a body of evidence [ 27 ]. These groups typically certify or commission systematic reviews that are published in exclusive databases (eg, Cochrane, JBI) or are used to develop government or agency sponsored guidelines or health technology assessments (eg, National Institute for Health and Care Excellence [NICE], Scottish Intercollegiate Guidelines Network [SIGN], Agency for Healthcare Research and Quality [AHRQ]). They offer developers of evidence syntheses various levels of methodological advice, technical and administrative support, and editorial assistance. Use of specific protocols and checklists are required for development teams within these groups, but their online methodological resources are accessible to any potential author.

Guidance for development of evidence syntheses

 Cochrane (formerly Cochrane Collaboration)
 JBI (formerly Joanna Briggs Institute)
 National Institute for Health and Care Excellence (NICE)—United Kingdom
 Scottish Intercollegiate Guidelines Network (SIGN) —Scotland
 Agency for Healthcare Research and Quality (AHRQ)—United States

Notably, Cochrane is the largest single producer of evidence syntheses in biomedical research; however, these only account for 15% of the total [ 28 ]. The World Health Organization requires Cochrane standards be used to develop evidence syntheses that inform their CPGs [ 29 ]. Authors investigating questions of intervention effectiveness in syntheses developed for Cochrane follow the Methodological Expectations of Cochrane Intervention Reviews [ 30 ] and undergo multi-tiered peer review [ 31 , 32 ]. Several empirical evaluations have shown that Cochrane systematic reviews are of higher methodological quality compared with non-Cochrane reviews [ 4 , 7 , 9 , 11 , 14 , 32 – 35 ]. However, some of these assessments have biases: they may be conducted by Cochrane-affiliated authors, and they sometimes use scales and tools developed and used in the Cochrane environment and by its partners. In addition, evidence syntheses published in the Cochrane database are not subject to space or word restrictions, while non-Cochrane syntheses are often limited. As a result, information that may be relevant to the critical appraisal of non-Cochrane reviews is often removed or is relegated to online-only supplements that may not be readily or fully accessible [ 28 ].

Influences on the state of evidence synthesis

Many authors are familiar with the evidence syntheses produced by the leading EBM organizations but can be intimidated by the time and effort necessary to apply their standards. Instead of following their guidance, authors may employ methods that are discouraged or outdated 28]. Suboptimal methods described in in the literature may then be taken up by others. For example, the Newcastle–Ottawa Scale (NOS) is a commonly used tool for appraising non-randomized studies [ 36 ]. Many authors justify their selection of this tool with reference to a publication that describes the unreliability of the NOS and recommends against its use [ 37 ]. Obviously, the authors who cite this report for that purpose have not read it. Authors and peer reviewers have a responsibility to use reliable and accurate methods and not copycat previous citations or substandard work [ 38 , 39 ]. Similar cautions may potentially extend to automation tools. These have concentrated on evidence searching [ 40 ] and selection given how demanding it is for humans to maintain truly up-to-date evidence [ 2 , 41 ]. Cochrane has deployed machine learning to identify randomized controlled trials (RCTs) and studies related to COVID-19, [ 2 , 42 ] but such tools are not yet commonly used [ 43 ]. The routine integration of automation tools in the development of future evidence syntheses should not displace the interpretive part of the process.

Editorials about unreliable or misleading systematic reviews highlight several of the intertwining factors that may contribute to continued publication of unreliable evidence syntheses: shortcomings and inconsistencies of the peer review process, lack of endorsement of current standards on the part of journal editors, the incentive structure of academia, industry influences, publication bias, and the lure of “predatory” journals [ 44 – 48 ]. At this juncture, clarification of the extent to which each of these factors contribute remains speculative, but their impact is likely to be synergistic.

Over time, the generalized acceptance of the conclusions of systematic reviews as incontrovertible has affected trends in the dissemination and uptake of evidence. Reporting of the results of evidence syntheses and recommendations of CPGs has shifted beyond medical journals to press releases and news headlines and, more recently, to the realm of social media and influencers. The lay public and policy makers may depend on these outlets for interpreting evidence syntheses and CPGs. Unfortunately, communication to the general public often reflects intentional or non-intentional misrepresentation or “spin” of the research findings [ 49 – 52 ] News and social media outlets also tend to reduce conclusions on a body of evidence and recommendations for treatment to binary choices (eg, “do it” versus “don’t do it”) that may be assigned an actionable symbol (eg, red/green traffic lights, smiley/frowning face emoji).

Strategies for improvement

Many authors and peer reviewers are volunteer health care professionals or trainees who lack formal training in evidence synthesis [ 46 , 53 ]. Informing them about research methodology could increase the likelihood they will apply rigorous methods [ 25 , 33 , 45 ]. We tackle this challenge, from both a theoretical and a practical perspective, by offering guidance applicable to any specialty. It is based on recent methodological research that is extensively referenced to promote self-study. However, the information presented is not intended to be substitute for committed training in evidence synthesis methodology; instead, we hope to inspire our target audience to seek such training. We also hope to inform a broader audience of clinicians and guideline developers influenced by evidence syntheses. Notably, these communities often include the same members who serve in different capacities.

In the following sections, we highlight methodological concepts and practices that may be unfamiliar, problematic, confusing, or controversial. In Part 2, we consider various types of evidence syntheses and the types of research evidence summarized by them. In Part 3, we examine some widely used (and misused) tools for the critical appraisal of systematic reviews and reporting guidelines for evidence syntheses. In Part 4, we discuss how to meet methodological conduct standards applicable to key components of systematic reviews. In Part 5, we describe the merits and caveats of rating the overall certainty of a body of evidence. Finally, in Part 6, we summarize suggested terminology, methods, and tools for development and evaluation of evidence syntheses that reflect current best practices.

Part 2. Types of syntheses and research evidence

A good foundation for the development of evidence syntheses requires an appreciation of their various methodologies and the ability to correctly identify the types of research potentially available for inclusion in the synthesis.

Types of evidence syntheses

Systematic reviews have historically focused on the benefits and harms of interventions; over time, various types of systematic reviews have emerged to address the diverse information needs of clinicians, patients, and policy makers [ 54 ] Systematic reviews with traditional components have become defined by the different topics they assess (Table 2.1 ). In addition, other distinctive types of evidence syntheses have evolved, including overviews or umbrella reviews, scoping reviews, rapid reviews, and living reviews. The popularity of these has been increasing in recent years [ 55 – 58 ]. A summary of the development, methods, available guidance, and indications for these unique types of evidence syntheses is available in Additional File 2 A.

Types of traditional systematic reviews

Review typeTopic assessedElements of research question (mnemonic)
Intervention [ , ]Benefits and harms of interventions used in healthcare. opulation, ntervention, omparator, utcome ( )
Diagnostic test accuracy [ ]How well a diagnostic test performs in diagnosing and detecting a particular disease. opulation, ndex test(s), and arget condition ( )
Qualitative
 Cochrane [ ]Questions are designed to improve understanding of intervention complexity, contextual variations, implementation, and stakeholder preferences and experiences.

etting, erspective, ntervention or Phenomenon of nterest, omparison, valuation ( )

ample, henomenon of nterest, esign, valuation, esearch type ( )

spective, etting, henomena of interest/Problem, nvironment, omparison (optional), me/timing, indings ( )

 JBI [ ]Questions inform meaningfulness and appropriateness of care and the impact of illness through documentation of stakeholder experiences, preferences, and priorities. opulation, the Phenomena of nterest, and the ntext
Prognostic [ ]Probable course or future outcome(s) of people with a health problem. opulation, ntervention (model), omparator, utcomes, iming, etting ( )
Etiology and risk [ ]The relationship (association) between certain factors (e.g., genetic, environmental) and the development of a disease or condition or other health outcome. opulation or groups at risk, xposure(s), associated utcome(s) (disease, symptom, or health condition of interest), the context/location or the time period and the length of time when relevant ( )
Measurement properties [ , ]What is the most suitable instrument to measure a construct of interest in a specific study population? opulation, nstrument, onstruct, utcomes ( )
Prevalence and incidence [ ]The frequency, distribution and determinants of specific factors, health states or conditions in a defined population: eg, how common is a particular disease or condition in a specific group of individuals?Factor, disease, symptom or health ndition of interest, the epidemiological indicator used to measure its frequency (prevalence, incidence), the ulation or groups at risk as well as the ntext/location and time period where relevant ( )

Both Cochrane [ 30 , 59 ] and JBI [ 60 ] provide methodologies for many types of evidence syntheses; they describe these with different terminology, but there is obvious overlap (Table 2.2 ). The majority of evidence syntheses published by Cochrane (96%) and JBI (62%) are categorized as intervention reviews. This reflects the earlier development and dissemination of their intervention review methodologies; these remain well-established [ 30 , 59 , 61 ] as both organizations continue to focus on topics related to treatment efficacy and harms. In contrast, intervention reviews represent only about half of the total published in the general medical literature, and several non-intervention review types contribute to a significant proportion of the other half.

Evidence syntheses published by Cochrane and JBI

Intervention857296.3Effectiveness43561.5
Diagnostic1761.9Diagnostic Test Accuracy91.3
Overview640.7Umbrella40.6
Methodology410.45Mixed Methods20.3
Qualitative170.19Qualitative15922.5
Prognostic110.12Prevalence and Incidence60.8
Rapid110.12Etiology and Risk71.0
Prototype 80.08Measurement Properties30.4
Economic60.6
Text and Opinion10.14
Scoping436.0
Comprehensive 324.5
Total = 8900Total = 707

a Data from https://www.cochranelibrary.com/cdsr/reviews . Accessed 17 Sep 2022

b Data obtained via personal email communication on 18 Sep 2022 with Emilie Francis, editorial assistant, JBI Evidence Synthesis

c Includes the following categories: prevalence, scoping, mixed methods, and realist reviews

d This methodology is not supported in the current version of the JBI Manual for Evidence Synthesis

Types of research evidence

There is consensus on the importance of using multiple study designs in evidence syntheses; at the same time, there is a lack of agreement on methods to identify included study designs. Authors of evidence syntheses may use various taxonomies and associated algorithms to guide selection and/or classification of study designs. These tools differentiate categories of research and apply labels to individual study designs (eg, RCT, cross-sectional). A familiar example is the Design Tree endorsed by the Centre for Evidence-Based Medicine [ 70 ]. Such tools may not be helpful to authors of evidence syntheses for multiple reasons.

Suboptimal levels of agreement and accuracy even among trained methodologists reflect challenges with the application of such tools [ 71 , 72 ]. Problematic distinctions or decision points (eg, experimental or observational, controlled or uncontrolled, prospective or retrospective) and design labels (eg, cohort, case control, uncontrolled trial) have been reported [ 71 ]. The variable application of ambiguous study design labels to non-randomized studies is common, making them especially prone to misclassification [ 73 ]. In addition, study labels do not denote the unique design features that make different types of non-randomized studies susceptible to different biases, including those related to how the data are obtained (eg, clinical trials, disease registries, wearable devices). Given this limitation, it is important to be aware that design labels preclude the accurate assignment of non-randomized studies to a “level of evidence” in traditional hierarchies [ 74 ].

These concerns suggest that available tools and nomenclature used to distinguish types of research evidence may not uniformly apply to biomedical research and non-health fields that utilize evidence syntheses (eg, education, economics) [ 75 , 76 ]. Moreover, primary research reports often do not describe study design or do so incompletely or inaccurately; thus, indexing in PubMed and other databases does not address the potential for misclassification [ 77 ]. Yet proper identification of research evidence has implications for several key components of evidence syntheses. For example, search strategies limited by index terms using design labels or study selection based on labels applied by the authors of primary studies may cause inconsistent or unjustified study inclusions and/or exclusions [ 77 ]. In addition, because risk of bias (RoB) tools consider attributes specific to certain types of studies and study design features, results of these assessments may be invalidated if an inappropriate tool is used. Appropriate classification of studies is also relevant for the selection of a suitable method of synthesis and interpretation of those results.

An alternative to these tools and nomenclature involves application of a few fundamental distinctions that encompass a wide range of research designs and contexts. While these distinctions are not novel, we integrate them into a practical scheme (see Fig. ​ Fig.1) 1 ) designed to guide authors of evidence syntheses in the basic identification of research evidence. The initial distinction is between primary and secondary studies. Primary studies are then further distinguished by: 1) the type of data reported (qualitative or quantitative); and 2) two defining design features (group or single-case and randomized or non-randomized). The different types of studies and study designs represented in the scheme are described in detail in Additional File 2 B. It is important to conceptualize their methods as complementary as opposed to contrasting or hierarchical [ 78 ]; each offers advantages and disadvantages that determine their appropriateness for answering different kinds of research questions in an evidence synthesis.

An external file that holds a picture, illustration, etc.
Object name is 13643_2023_2255_Fig1_HTML.jpg

Distinguishing types of research evidence

Application of these basic distinctions may avoid some of the potential difficulties associated with study design labels and taxonomies. Nevertheless, debatable methodological issues are raised when certain types of research identified in this scheme are included in an evidence synthesis. We briefly highlight those associated with inclusion of non-randomized studies, case reports and series, and a combination of primary and secondary studies.

Non-randomized studies

When investigating an intervention’s effectiveness, it is important for authors to recognize the uncertainty of observed effects reported by studies with high RoB. Results of statistical analyses that include such studies need to be interpreted with caution in order to avoid misleading conclusions [ 74 ]. Review authors may consider excluding randomized studies with high RoB from meta-analyses. Non-randomized studies of intervention (NRSI) are affected by a greater potential range of biases and thus vary more than RCTs in their ability to estimate a causal effect [ 79 ]. If data from NRSI are synthesized in meta-analyses, it is helpful to separately report their summary estimates [ 6 , 74 ].

Nonetheless, certain design features of NRSI (eg, which parts of the study were prospectively designed) may help to distinguish stronger from weaker ones. Cochrane recommends that authors of a review including NRSI focus on relevant study design features when determining eligibility criteria instead of relying on non-informative study design labels [ 79 , 80 ] This process is facilitated by a study design feature checklist; guidance on using the checklist is included with developers’ description of the tool [ 73 , 74 ]. Authors collect information about these design features during data extraction and then consider it when making final study selection decisions and when performing RoB assessments of the included NRSI.

Case reports and case series

Correctly identified case reports and case series can contribute evidence not well captured by other designs [ 81 ]; in addition, some topics may be limited to a body of evidence that consists primarily of uncontrolled clinical observations. Murad and colleagues offer a framework for how to include case reports and series in an evidence synthesis [ 82 ]. Distinguishing between cohort studies and case series in these syntheses is important, especially for those that rely on evidence from NRSI. Additional data obtained from studies misclassified as case series can potentially increase the confidence in effect estimates. Mathes and Pieper provide authors of evidence syntheses with specific guidance on distinguishing between cohort studies and case series, but emphasize the increased workload involved [ 77 ].

Primary and secondary studies

Synthesis of combined evidence from primary and secondary studies may provide a broad perspective on the entirety of available literature on a topic. This is, in fact, the recommended strategy for scoping reviews that may include a variety of sources of evidence (eg, CPGs, popular media). However, except for scoping reviews, the synthesis of data from primary and secondary studies is discouraged unless there are strong reasons to justify doing so.

Combining primary and secondary sources of evidence is challenging for authors of other types of evidence syntheses for several reasons [ 83 ]. Assessments of RoB for primary and secondary studies are derived from conceptually different tools, thus obfuscating the ability to make an overall RoB assessment of a combination of these study types. In addition, authors who include primary and secondary studies must devise non-standardized methods for synthesis. Note this contrasts with well-established methods available for updating existing evidence syntheses with additional data from new primary studies [ 84 – 86 ]. However, a new review that synthesizes data from primary and secondary studies raises questions of validity and may unintentionally support a biased conclusion because no existing methodological guidance is currently available [ 87 ].

Recommendations

We suggest that journal editors require authors to identify which type of evidence synthesis they are submitting and reference the specific methodology used for its development. This will clarify the research question and methods for peer reviewers and potentially simplify the editorial process. Editors should announce this practice and include it in the instructions to authors. To decrease bias and apply correct methods, authors must also accurately identify the types of research evidence included in their syntheses.

Part 3. Conduct and reporting

The need to develop criteria to assess the rigor of systematic reviews was recognized soon after the EBM movement began to gain international traction [ 88 , 89 ]. Systematic reviews rapidly became popular, but many were very poorly conceived, conducted, and reported. These problems remain highly prevalent [ 23 ] despite development of guidelines and tools to standardize and improve the performance and reporting of evidence syntheses [ 22 , 28 ]. Table 3.1  provides some historical perspective on the evolution of tools developed specifically for the evaluation of systematic reviews, with or without meta-analysis.

Tools specifying standards for systematic reviews with and without meta-analysis

 Quality of Reporting of Meta-analyses (QUOROM) StatementMoher 1999 [ ]
 Meta-analyses Of Observational Studies in Epidemiology (MOOSE)Stroup 2000 [ ]
 Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)Moher 2009 [ ]
 PRISMA 2020 Page 2021 [ ]
 Overview Quality Assessment Questionnaire (OQAQ)Oxman and Guyatt 1991 [ ]
 Systematic Review Critical Appraisal SheetCentre for Evidence-based Medicine 2005 [ ]
 A Measurement Tool to Assess Systematic Reviews (AMSTAR)Shea 2007 [ ]
 AMSTAR-2 Shea 2017 [ ]
 Risk of Bias in Systematic Reviews (ROBIS) Whiting 2016 [ ]

a Currently recommended

b Validated tool for systematic reviews of interventions developed for use by authors of overviews or umbrella reviews

These tools are often interchangeably invoked when referring to the “quality” of an evidence synthesis. However, quality is a vague term that is frequently misused and misunderstood; more precisely, these tools specify different standards for evidence syntheses. Methodological standards address how well a systematic review was designed and performed [ 5 ]. RoB assessments refer to systematic flaws or limitations in the design, conduct, or analysis of research that distort the findings of the review [ 4 ]. Reporting standards help systematic review authors describe the methodology they used and the results of their synthesis in sufficient detail [ 92 ]. It is essential to distinguish between these evaluations: a systematic review may be biased, it may fail to report sufficient information on essential features, or it may exhibit both problems; a thoroughly reported systematic evidence synthesis review may still be biased and flawed while an otherwise unbiased one may suffer from deficient documentation.

We direct attention to the currently recommended tools listed in Table 3.1  but concentrate on AMSTAR-2 (update of AMSTAR [A Measurement Tool to Assess Systematic Reviews]) and ROBIS (Risk of Bias in Systematic Reviews), which evaluate methodological quality and RoB, respectively. For comparison and completeness, we include PRISMA 2020 (update of the 2009 Preferred Reporting Items for Systematic Reviews of Meta-Analyses statement), which offers guidance on reporting standards. The exclusive focus on these three tools is by design; it addresses concerns related to the considerable variability in tools used for the evaluation of systematic reviews [ 28 , 88 , 96 , 97 ]. We highlight the underlying constructs these tools were designed to assess, then describe their components and applications. Their known (or potential) uptake and impact and limitations are also discussed.

Evaluation of conduct

Development.

AMSTAR [ 5 ] was in use for a decade prior to the 2017 publication of AMSTAR-2; both provide a broad evaluation of methodological quality of intervention systematic reviews, including flaws arising through poor conduct of the review [ 6 ]. ROBIS, published in 2016, was developed to specifically assess RoB introduced by the conduct of the review; it is applicable to systematic reviews of interventions and several other types of reviews [ 4 ]. Both tools reflect a shift to a domain-based approach as opposed to generic quality checklists. There are a few items unique to each tool; however, similarities between items have been demonstrated [ 98 , 99 ]. AMSTAR-2 and ROBIS are recommended for use by: 1) authors of overviews or umbrella reviews and CPGs to evaluate systematic reviews considered as evidence; 2) authors of methodological research studies to appraise included systematic reviews; and 3) peer reviewers for appraisal of submitted systematic review manuscripts. For authors, these tools may function as teaching aids and inform conduct of their review during its development.

Description

Systematic reviews that include randomized and/or non-randomized studies as evidence can be appraised with AMSTAR-2 and ROBIS. Other characteristics of AMSTAR-2 and ROBIS are summarized in Table 3.2 . Both tools define categories for an overall rating; however, neither tool is intended to generate a total score by simply calculating the number of responses satisfying criteria for individual items [ 4 , 6 ]. AMSTAR-2 focuses on the rigor of a review’s methods irrespective of the specific subject matter. ROBIS places emphasis on a review’s results section— this suggests it may be optimally applied by appraisers with some knowledge of the review’s topic as they may be better equipped to determine if certain procedures (or lack thereof) would impact the validity of a review’s findings [ 98 , 100 ]. Reliability studies show AMSTAR-2 overall confidence ratings strongly correlate with the overall RoB ratings in ROBIS [ 100 , 101 ].

Comparison of AMSTAR-2 and ROBIS

Characteristic
ExtensiveExtensive
InterventionIntervention, diagnostic, etiology, prognostic
7 critical, 9 non-critical4
 Total number1629
 Response options

Items # 1, 3, 5, 6, 10, 13, 14, 16: rated or

Items # 2, 4, 7, 8, 9 : rated or

Items # 11 , 12, 15: rated or

24 assessment items: rated

5 items regarding level of concern: rated

 ConstructConfidence based on weaknesses in critical domainsLevel of concern for risk of bias
 CategoriesHigh, moderate, low, critically lowLow, high, unclear

a ROBIS includes an optional first phase to assess the applicability of the review to the research question of interest. The tool may be applicable to other review types in addition to the four specified, although modification of this initial phase will be needed (Personal Communication via email, Penny Whiting, 28 Jan 2022)

b AMSTAR-2 item #9 and #11 require separate responses for RCTs and NRSI

Interrater reliability has been shown to be acceptable for AMSTAR-2 [ 6 , 11 , 102 ] and ROBIS [ 4 , 98 , 103 ] but neither tool has been shown to be superior in this regard [ 100 , 101 , 104 , 105 ]. Overall, variability in reliability for both tools has been reported across items, between pairs of raters, and between centers [ 6 , 100 , 101 , 104 ]. The effects of appraiser experience on the results of AMSTAR-2 and ROBIS require further evaluation [ 101 , 105 ]. Updates to both tools should address items shown to be prone to individual appraisers’ subjective biases and opinions [ 11 , 100 ]; this may involve modifications of the current domains and signaling questions as well as incorporation of methods to make an appraiser’s judgments more explicit. Future revisions of these tools may also consider the addition of standards for aspects of systematic review development currently lacking (eg, rating overall certainty of evidence, [ 99 ] methods for synthesis without meta-analysis [ 105 ]) and removal of items that assess aspects of reporting that are thoroughly evaluated by PRISMA 2020.

Application

A good understanding of what is required to satisfy the standards of AMSTAR-2 and ROBIS involves study of the accompanying guidance documents written by the tools’ developers; these contain detailed descriptions of each item’s standards. In addition, accurate appraisal of a systematic review with either tool requires training. Most experts recommend independent assessment by at least two appraisers with a process for resolving discrepancies as well as procedures to establish interrater reliability, such as pilot testing, a calibration phase or exercise, and development of predefined decision rules [ 35 , 99 – 101 , 103 , 104 , 106 ]. These methods may, to some extent, address the challenges associated with the diversity in methodological training, subject matter expertise, and experience using the tools that are likely to exist among appraisers.

The standards of AMSTAR, AMSTAR-2, and ROBIS have been used in many methodological studies and epidemiological investigations. However, the increased publication of overviews or umbrella reviews and CPGs has likely been a greater influence on the widening acceptance of these tools. Critical appraisal of the secondary studies considered evidence is essential to the trustworthiness of both the recommendations of CPGs and the conclusions of overviews. Currently both Cochrane [ 55 ] and JBI [ 107 ] recommend AMSTAR-2 and ROBIS in their guidance for authors of overviews or umbrella reviews. However, ROBIS and AMSTAR-2 were released in 2016 and 2017, respectively; thus, to date, limited data have been reported about the uptake of these tools or which of the two may be preferred [ 21 , 106 ]. Currently, in relation to CPGs, AMSTAR-2 appears to be overwhelmingly popular compared to ROBIS. A Google Scholar search of this topic (search terms “AMSTAR 2 AND clinical practice guidelines,” “ROBIS AND clinical practice guidelines” 13 May 2022) found 12,700 hits for AMSTAR-2 and 1,280 for ROBIS. The apparent greater appeal of AMSTAR-2 may relate to its longer track record given the original version of the tool was in use for 10 years prior to its update in 2017.

Barriers to the uptake of AMSTAR-2 and ROBIS include the real or perceived time and resources necessary to complete the items they include and appraisers’ confidence in their own ratings [ 104 ]. Reports from comparative studies available to date indicate that appraisers find AMSTAR-2 questions, responses, and guidance to be clearer and simpler compared with ROBIS [ 11 , 101 , 104 , 105 ]. This suggests that for appraisal of intervention systematic reviews, AMSTAR-2 may be a more practical tool than ROBIS, especially for novice appraisers [ 101 , 103 – 105 ]. The unique characteristics of each tool, as well as their potential advantages and disadvantages, should be taken into consideration when deciding which tool should be used for an appraisal of a systematic review. In addition, the choice of one or the other may depend on how the results of an appraisal will be used; for example, a peer reviewer’s appraisal of a single manuscript versus an appraisal of multiple systematic reviews in an overview or umbrella review, CPG, or systematic methodological study.

Authors of overviews and CPGs report results of AMSTAR-2 and ROBIS appraisals for each of the systematic reviews they include as evidence. Ideally, an independent judgment of their appraisals can be made by the end users of overviews and CPGs; however, most stakeholders, including clinicians, are unlikely to have a sophisticated understanding of these tools. Nevertheless, they should at least be aware that AMSTAR-2 and ROBIS ratings reported in overviews and CPGs may be inaccurate because the tools are not applied as intended by their developers. This can result from inadequate training of the overview or CPG authors who perform the appraisals, or to modifications of the appraisal tools imposed by them. The potential variability in overall confidence and RoB ratings highlights why appraisers applying these tools need to support their judgments with explicit documentation; this allows readers to judge for themselves whether they agree with the criteria used by appraisers [ 4 , 108 ]. When these judgments are explicit, the underlying rationale used when applying these tools can be assessed [ 109 ].

Theoretically, we would expect an association of AMSTAR-2 with improved methodological rigor and an association of ROBIS with lower RoB in recent systematic reviews compared to those published before 2017. To our knowledge, this has not yet been demonstrated; however, like reports about the actual uptake of these tools, time will tell. Additional data on user experience is also needed to further elucidate the practical challenges and methodological nuances encountered with the application of these tools. This information could potentially inform the creation of unifying criteria to guide and standardize the appraisal of evidence syntheses [ 109 ].

Evaluation of reporting

Complete reporting is essential for users to establish the trustworthiness and applicability of a systematic review’s findings. Efforts to standardize and improve the reporting of systematic reviews resulted in the 2009 publication of the PRISMA statement [ 92 ] with its accompanying explanation and elaboration document [ 110 ]. This guideline was designed to help authors prepare a complete and transparent report of their systematic review. In addition, adherence to PRISMA is often used to evaluate the thoroughness of reporting of published systematic reviews [ 111 ]. The updated version, PRISMA 2020 [ 93 ], and its guidance document [ 112 ] were published in 2021. Items on the original and updated versions of PRISMA are organized by the six basic review components they address (title, abstract, introduction, methods, results, discussion). The PRISMA 2020 update is a considerably expanded version of the original; it includes standards and examples for the 27 original and 13 additional reporting items that capture methodological advances and may enhance the replicability of reviews [ 113 ].

The original PRISMA statement fostered the development of various PRISMA extensions (Table 3.3 ). These include reporting guidance for scoping reviews and reviews of diagnostic test accuracy and for intervention reviews that report on the following: harms outcomes, equity issues, the effects of acupuncture, the results of network meta-analyses and analyses of individual participant data. Detailed reporting guidance for specific systematic review components (abstracts, protocols, literature searches) is also available.

PRISMA extensions

PRISMA for systematic reviews with a focus on health equity [ ]PRISMA-E2012
Reporting systematic reviews in journal and conference abstracts [ ]PRISMA for Abstracts2015; 2020
PRISMA for systematic review protocols [ ]PRISMA-P2015
PRISMA for Network Meta-Analyses [ ]PRISMA-NMA2015
PRISMA for Individual Participant Data [ ]PRISMA-IPD2015
PRISMA for reviews including harms outcomes [ ]PRISMA-Harms2016
PRISMA for diagnostic test accuracy [ ]PRISMA-DTA2018
PRISMA for scoping reviews [ ]PRISMA-ScR2018
PRISMA for acupuncture [ ]PRISMA-A2019
PRISMA for reporting literature searches [ ]PRISMA-S2021

PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses

a Note the abstract reporting checklist is now incorporated into PRISMA 2020 [ 93 ]

Uptake and impact

The 2009 PRISMA standards [ 92 ] for reporting have been widely endorsed by authors, journals, and EBM-related organizations. We anticipate the same for PRISMA 2020 [ 93 ] given its co-publication in multiple high-impact journals. However, to date, there is a lack of strong evidence for an association between improved systematic review reporting and endorsement of PRISMA 2009 standards [ 43 , 111 ]. Most journals require a PRISMA checklist accompany submissions of systematic review manuscripts. However, the accuracy of information presented on these self-reported checklists is not necessarily verified. It remains unclear which strategies (eg, authors’ self-report of checklists, peer reviewer checks) might improve adherence to the PRISMA reporting standards; in addition, the feasibility of any potentially effective strategies must be taken into consideration given the structure and limitations of current research and publication practices [ 124 ].

Pitfalls and limitations of PRISMA, AMSTAR-2, and ROBIS

Misunderstanding of the roles of these tools and their misapplication may be widespread problems. PRISMA 2020 is a reporting guideline that is most beneficial if consulted when developing a review as opposed to merely completing a checklist when submitting to a journal; at that point, the review is finished, with good or bad methodological choices. However, PRISMA checklists evaluate how completely an element of review conduct was reported, but do not evaluate the caliber of conduct or performance of a review. Thus, review authors and readers should not think that a rigorous systematic review can be produced by simply following the PRISMA 2020 guidelines. Similarly, it is important to recognize that AMSTAR-2 and ROBIS are tools to evaluate the conduct of a review but do not substitute for conceptual methodological guidance. In addition, they are not intended to be simple checklists. In fact, they have the potential for misuse or abuse if applied as such; for example, by calculating a total score to make a judgment about a review’s overall confidence or RoB. Proper selection of a response for the individual items on AMSTAR-2 and ROBIS requires training or at least reference to their accompanying guidance documents.

Not surprisingly, it has been shown that compliance with the PRISMA checklist is not necessarily associated with satisfying the standards of ROBIS [ 125 ]. AMSTAR-2 and ROBIS were not available when PRISMA 2009 was developed; however, they were considered in the development of PRISMA 2020 [ 113 ]. Therefore, future studies may show a positive relationship between fulfillment of PRISMA 2020 standards for reporting and meeting the standards of tools evaluating methodological quality and RoB.

Choice of an appropriate tool for the evaluation of a systematic review first involves identification of the underlying construct to be assessed. For systematic reviews of interventions, recommended tools include AMSTAR-2 and ROBIS for appraisal of conduct and PRISMA 2020 for completeness of reporting. All three tools were developed rigorously and provide easily accessible and detailed user guidance, which is necessary for their proper application and interpretation. When considering a manuscript for publication, training in these tools can sensitize peer reviewers and editors to major issues that may affect the review’s trustworthiness and completeness of reporting. Judgment of the overall certainty of a body of evidence and formulation of recommendations rely, in part, on AMSTAR-2 or ROBIS appraisals of systematic reviews. Therefore, training on the application of these tools is essential for authors of overviews and developers of CPGs. Peer reviewers and editors considering an overview or CPG for publication must hold their authors to a high standard of transparency regarding both the conduct and reporting of these appraisals.

Part 4. Meeting conduct standards

Many authors, peer reviewers, and editors erroneously equate fulfillment of the items on the PRISMA checklist with superior methodological rigor. For direction on methodology, we refer them to available resources that provide comprehensive conceptual guidance [ 59 , 60 ] as well as primers with basic step-by-step instructions [ 1 , 126 , 127 ]. This section is intended to complement study of such resources by facilitating use of AMSTAR-2 and ROBIS, tools specifically developed to evaluate methodological rigor of systematic reviews. These tools are widely accepted by methodologists; however, in the general medical literature, they are not uniformly selected for the critical appraisal of systematic reviews [ 88 , 96 ].

To enable their uptake, Table 4.1  links review components to the corresponding appraisal tool items. Expectations of AMSTAR-2 and ROBIS are concisely stated, and reasoning provided.

Systematic review components linked to appraisal with AMSTAR-2 and ROBIS a

Table Table
Methods for study selection#5#2.5All three components must be done in duplicate, and methods fully described.Helps to mitigate CoI and bias; also may improve accuracy.
Methods for data extraction#6#3.1
Methods for RoB assessmentNA#3.5
Study description#8#3.2Research design features, components of research question (eg, PICO), setting, funding sources.Allows readers to understand the individual studies in detail.
Sources of funding#10NAIdentified for all included studies.Can reveal CoI or bias.
Publication bias#15*#4.5Explored, diagrammed, and discussed.Publication and other selective reporting biases are major threats to the validity of systematic reviews.
Author CoI#16NADisclosed, with management strategies described.If CoI is identified, management strategies must be described to ensure confidence in the review.

CoI conflict of interest, MA meta-analysis, NA not addressed, PICO participant, intervention, comparison, outcome, PRISMA-P Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols, RoB risk of bias

a Components shown in bold are chosen for elaboration in Part 4 for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors; and/or 2) the component is evaluated by standards of an AMSTAR-2 “critical” domain

b Critical domains of AMSTAR-2 are indicated by *

Issues involved in meeting the standards for seven review components (identified in bold in Table 4.1 ) are addressed in detail. These were chosen for elaboration for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors based on consistent reports of their frequent AMSTAR-2 or ROBIS deficiencies [ 9 , 11 , 15 , 88 , 128 , 129 ]; and/or 2) the review component is judged by standards of an AMSTAR-2 “critical” domain. These have the greatest implications for how a systematic review will be appraised: if standards for any one of these critical domains are not met, the review is rated as having “critically low confidence.”

Research question

Specific and unambiguous research questions may have more value for reviews that deal with hypothesis testing. Mnemonics for the various elements of research questions are suggested by JBI and Cochrane (Table 2.1 ). These prompt authors to consider the specialized methods involved for developing different types of systematic reviews; however, while inclusion of the suggested elements makes a review compliant with a particular review’s methods, it does not necessarily make a research question appropriate. Table 4.2  lists acronyms that may aid in developing the research question. They include overlapping concepts of importance in this time of proliferating reviews of uncertain value [ 130 ]. If these issues are not prospectively contemplated, systematic review authors may establish an overly broad scope, or develop runaway scope allowing them to stray from predefined choices relating to key comparisons and outcomes.

Research question development

AcronymMeaning
feasible, interesting, novel, ethical, and relevant
specific, measurable, attainable, relevant, timely
time, outcomes, population, intervention, context, study design, plus (effect) moderators

a Cummings SR, Browner WS, Hulley SB. Conceiving the research question and developing the study plan. In: Hulley SB, Cummings SR, Browner WS, editors. Designing clinical research: an epidemiological approach; 4th edn. Lippincott Williams & Wilkins; 2007. p. 14–22

b Doran, GT. There’s a S.M.A.R.T. way to write management’s goals and objectives. Manage Rev. 1981;70:35-6.

c Johnson BT, Hennessy EA. Systematic reviews and meta-analyses in the health sciences: best practice methods for research syntheses. Soc Sci Med. 2019;233:237–51

Once a research question is established, searching on registry sites and databases for existing systematic reviews addressing the same or a similar topic is necessary in order to avoid contributing to research waste [ 131 ]. Repeating an existing systematic review must be justified, for example, if previous reviews are out of date or methodologically flawed. A full discussion on replication of intervention systematic reviews, including a consensus checklist, can be found in the work of Tugwell and colleagues [ 84 ].

Protocol development is considered a core component of systematic reviews [ 125 , 126 , 132 ]. Review protocols may allow researchers to plan and anticipate potential issues, assess validity of methods, prevent arbitrary decision-making, and minimize bias that can be introduced by the conduct of the review. Registration of a protocol that allows public access promotes transparency of the systematic review’s methods and processes and reduces the potential for duplication [ 132 ]. Thinking early and carefully about all the steps of a systematic review is pragmatic and logical and may mitigate the influence of the authors’ prior knowledge of the evidence [ 133 ]. In addition, the protocol stage is when the scope of the review can be carefully considered by authors, reviewers, and editors; this may help to avoid production of overly ambitious reviews that include excessive numbers of comparisons and outcomes or are undisciplined in their study selection.

An association with attainment of AMSTAR standards in systematic reviews with published prospective protocols has been reported [ 134 ]. However, completeness of reporting does not seem to be different in reviews with a protocol compared to those without one [ 135 ]. PRISMA-P [ 116 ] and its accompanying elaboration and explanation document [ 136 ] can be used to guide and assess the reporting of protocols. A final version of the review should fully describe any protocol deviations. Peer reviewers may compare the submitted manuscript with any available pre-registered protocol; this is required if AMSTAR-2 or ROBIS are used for critical appraisal.

There are multiple options for the recording of protocols (Table 4.3 ). Some journals will peer review and publish protocols. In addition, many online sites offer date-stamped and publicly accessible protocol registration. Some of these are exclusively for protocols of evidence syntheses; others are less restrictive and offer researchers the capacity for data storage, sharing, and other workflow features. These sites document protocol details to varying extents and have different requirements [ 137 ]. The most popular site for systematic reviews, the International Prospective Register of Systematic Reviews (PROSPERO), for example, only registers reviews that report on an outcome with direct relevance to human health. The PROSPERO record documents protocols for all types of reviews except literature and scoping reviews. Of note, PROSPERO requires authors register their review protocols prior to any data extraction [ 133 , 138 ]. The electronic records of most of these registry sites allow authors to update their protocols and facilitate transparent tracking of protocol changes, which are not unexpected during the progress of the review [ 139 ].

Options for protocol registration of evidence syntheses

 BMJ Open
 BioMed Central
 JMIR Research Protocols
 World Journal of Meta-analysis
 Cochrane
 JBI
 PROSPERO

 Research Registry-

 Registry of Systematic Reviews/Meta-Analyses

 International Platform of Registered Systematic Review and Meta-analysis Protocols (INPLASY)
 Center for Open Science
 Protocols.io
 Figshare
 Open Science Framework
 Zenodo

a Authors are advised to contact their target journal regarding submission of systematic review protocols

b Registration is restricted to approved review projects

c The JBI registry lists review projects currently underway by JBI-affiliated entities. These records include a review’s title, primary author, research question, and PICO elements. JBI recommends that authors register eligible protocols with PROSPERO

d See Pieper and Rombey [ 137 ] for detailed characteristics of these five registries

e See Pieper and Rombey [ 137 ] for other systematic review data repository options

Study design inclusion

For most systematic reviews, broad inclusion of study designs is recommended [ 126 ]. This may allow comparison of results between contrasting study design types [ 126 ]. Certain study designs may be considered preferable depending on the type of review and nature of the research question. However, prevailing stereotypes about what each study design does best may not be accurate. For example, in systematic reviews of interventions, randomized designs are typically thought to answer highly specific questions while non-randomized designs often are expected to reveal greater information about harms or real-word evidence [ 126 , 140 , 141 ]. This may be a false distinction; randomized trials may be pragmatic [ 142 ], they may offer important (and more unbiased) information on harms [ 143 ], and data from non-randomized trials may not necessarily be more real-world-oriented [ 144 ].

Moreover, there may not be any available evidence reported by RCTs for certain research questions; in some cases, there may not be any RCTs or NRSI. When the available evidence is limited to case reports and case series, it is not possible to test hypotheses nor provide descriptive estimates or associations; however, a systematic review of these studies can still offer important insights [ 81 , 145 ]. When authors anticipate that limited evidence of any kind may be available to inform their research questions, a scoping review can be considered. Alternatively, decisions regarding inclusion of indirect as opposed to direct evidence can be addressed during protocol development [ 146 ]. Including indirect evidence at an early stage of intervention systematic review development allows authors to decide if such studies offer any additional and/or different understanding of treatment effects for their population or comparison of interest. Issues of indirectness of included studies are accounted for later in the process, during determination of the overall certainty of evidence (see Part 5 for details).

Evidence search

Both AMSTAR-2 and ROBIS require systematic and comprehensive searches for evidence. This is essential for any systematic review. Both tools discourage search restrictions based on language and publication source. Given increasing globalism in health care, the practice of including English-only literature should be avoided [ 126 ]. There are many examples in which language bias (different results in studies published in different languages) has been documented [ 147 , 148 ]. This does not mean that all literature, in all languages, is equally trustworthy [ 148 ]; however, the only way to formally probe for the potential of such biases is to consider all languages in the initial search. The gray literature and a search of trials may also reveal important details about topics that would otherwise be missed [ 149 – 151 ]. Again, inclusiveness will allow review authors to investigate whether results differ in gray literature and trials [ 41 , 151 – 153 ].

Authors should make every attempt to complete their review within one year as that is the likely viable life of a search. (1) If that is not possible, the search should be updated close to the time of completion [ 154 ]. Different research topics may warrant less of a delay, for example, in rapidly changing fields (as in the case of the COVID-19 pandemic), even one month may radically change the available evidence.

Excluded studies

AMSTAR-2 requires authors to provide references for any studies excluded at the full text phase of study selection along with reasons for exclusion; this allows readers to feel confident that all relevant literature has been considered for inclusion and that exclusions are defensible.

Risk of bias assessment of included studies

The design of the studies included in a systematic review (eg, RCT, cohort, case series) should not be equated with appraisal of its RoB. To meet AMSTAR-2 and ROBIS standards, systematic review authors must examine RoB issues specific to the design of each primary study they include as evidence. It is unlikely that a single RoB appraisal tool will be suitable for all research designs. In addition to tools for randomized and non-randomized studies, specific tools are available for evaluation of RoB in case reports and case series [ 82 ] and single-case experimental designs [ 155 , 156 ]. Note the RoB tools selected must meet the standards of the appraisal tool used to judge the conduct of the review. For example, AMSTAR-2 identifies four sources of bias specific to RCTs and NRSI that must be addressed by the RoB tool(s) chosen by the review authors. The Cochrane RoB-2 [ 157 ] tool for RCTs and ROBINS-I [ 158 ] for NRSI for RoB assessment meet the AMSTAR-2 standards. Appraisers on the review team should not modify any RoB tool without complete transparency and acknowledgment that they have invalidated the interpretation of the tool as intended by its developers [ 159 ]. Conduct of RoB assessments is not addressed AMSTAR-2; to meet ROBIS standards, two independent reviewers should complete RoB assessments of included primary studies.

Implications of the RoB assessments must be explicitly discussed and considered in the conclusions of the review. Discussion of the overall RoB of included studies may consider the weight of the studies at high RoB, the importance of the sources of bias in the studies being summarized, and if their importance differs in relationship to the outcomes reported. If a meta-analysis is performed, serious concerns for RoB of individual studies should be accounted for in these results as well. If the results of the meta-analysis for a specific outcome change when studies at high RoB are excluded, readers will have a more accurate understanding of this body of evidence. However, while investigating the potential impact of specific biases is a useful exercise, it is important to avoid over-interpretation, especially when there are sparse data.

Synthesis methods for quantitative data

Syntheses of quantitative data reported by primary studies are broadly categorized as one of two types: meta-analysis, and synthesis without meta-analysis (Table 4.4 ). Before deciding on one of these methods, authors should seek methodological advice about whether reported data can be transformed or used in other ways to provide a consistent effect measure across studies [ 160 , 161 ].

Common methods for quantitative synthesis

Aggregate data

Individual

participant data

Weighted average of effect estimates

Pairwise comparisons of effect estimates, CI

Overall effect estimate, CI, value

Evaluation of heterogeneity

Forest plot with summary statistic for average effect estimate
Network Variable The interventions, which are compared directly indirectlyNetwork diagram or graph, tabular presentations
Comparisons of relative effects between any pair of interventionsEffect estimates for intervention pairings
Summary relative effects for pair-wise comparisons with evaluations of inconsistency and heterogeneityForest plot, other methods
Treatment rankings (ie, probability that an intervention is among the best options)Rankogram plot
Summarizing effect estimates from separate studies (without combination that would provide an average effect estimate)Range and distribution of observed effects such as median, interquartile range, range

Box-and-whisker plot, bubble plot

Forest plot (without summary effect estimate)

Combining valuesCombined value, number of studiesAlbatross plot (study sample size against values per outcome)
Vote counting by direction of effect (eg, favors intervention over the comparator)Proportion of studies with an effect in the direction of interest, CI, valueHarvest plot, effect direction plot

CI confidence interval (or credible interval, if analysis is done in Bayesian framework)

a See text for descriptions of the types of data combined in each of these approaches

b See Additional File 4  for guidance on the structure and presentation of forest plots

c General approach is similar to aggregate data meta-analysis but there are substantial differences relating to data collection and checking and analysis [ 162 ]. This approach to syntheses is applicable to intervention, diagnostic, and prognostic systematic reviews [ 163 ]

d Examples include meta-regression, hierarchical and multivariate approaches [ 164 ]

e In-depth guidance and illustrations of these methods are provided in Chapter 12 of the Cochrane Handbook [ 160 ]

Meta-analysis

Systematic reviews that employ meta-analysis should not be referred to simply as “meta-analyses.” The term meta-analysis strictly refers to a specific statistical technique used when study effect estimates and their variances are available, yielding a quantitative summary of results. In general, methods for meta-analysis involve use of a weighted average of effect estimates from two or more studies. If considered carefully, meta-analysis increases the precision of the estimated magnitude of effect and can offer useful insights about heterogeneity and estimates of effects. We refer to standard references for a thorough introduction and formal training [ 165 – 167 ].

There are three common approaches to meta-analysis in current health care–related systematic reviews (Table 4.4 ). Aggregate meta-analyses is the most familiar to authors of evidence syntheses and their end users. This standard meta-analysis combines data on effect estimates reported by studies that investigate similar research questions involving direct comparisons of an intervention and comparator. Results of these analyses provide a single summary intervention effect estimate. If the included studies in a systematic review measure an outcome differently, their reported results may be transformed to make them comparable [ 161 ]. Forest plots visually present essential information about the individual studies and the overall pooled analysis (see Additional File 4  for details).

Less familiar and more challenging meta-analytical approaches used in secondary research include individual participant data (IPD) and network meta-analyses (NMA); PRISMA extensions provide reporting guidelines for both [ 117 , 118 ]. In IPD, the raw data on each participant from each eligible study are re-analyzed as opposed to the study-level data analyzed in aggregate data meta-analyses [ 168 ]. This may offer advantages, including the potential for limiting concerns about bias and allowing more robust analyses [ 163 ]. As suggested by the description in Table 4.4 , NMA is a complex statistical approach. It combines aggregate data [ 169 ] or IPD [ 170 ] for effect estimates from direct and indirect comparisons reported in two or more studies of three or more interventions. This makes it a potentially powerful statistical tool; while multiple interventions are typically available to treat a condition, few have been evaluated in head-to-head trials [ 171 ]. Both IPD and NMA facilitate a broader scope, and potentially provide more reliable and/or detailed results; however, compared with standard aggregate data meta-analyses, their methods are more complicated, time-consuming, and resource-intensive, and they have their own biases, so one needs sufficient funding, technical expertise, and preparation to employ them successfully [ 41 , 172 , 173 ].

Several items in AMSTAR-2 and ROBIS address meta-analysis; thus, understanding the strengths, weaknesses, assumptions, and limitations of methods for meta-analyses is important. According to the standards of both tools, plans for a meta-analysis must be addressed in the review protocol, including reasoning, description of the type of quantitative data to be synthesized, and the methods planned for combining the data. This should not consist of stock statements describing conventional meta-analysis techniques; rather, authors are expected to anticipate issues specific to their research questions. Concern for the lack of training in meta-analysis methods among systematic review authors cannot be overstated. For those with training, the use of popular software (eg, RevMan [ 174 ], MetaXL [ 175 ], JBI SUMARI [ 176 ]) may facilitate exploration of these methods; however, such programs cannot substitute for the accurate interpretation of the results of meta-analyses, especially for more complex meta-analytical approaches.

Synthesis without meta-analysis

There are varied reasons a meta-analysis may not be appropriate or desirable [ 160 , 161 ]. Syntheses that informally use statistical methods other than meta-analysis are variably referred to as descriptive, narrative, or qualitative syntheses or summaries; these terms are also applied to syntheses that make no attempt to statistically combine data from individual studies. However, use of such imprecise terminology is discouraged; in order to fully explore the results of any type of synthesis, some narration or description is needed to supplement the data visually presented in tabular or graphic forms [ 63 , 177 ]. In addition, the term “qualitative synthesis” is easily confused with a synthesis of qualitative data in a qualitative or mixed methods review. “Synthesis without meta-analysis” is currently the preferred description of other ways to combine quantitative data from two or more studies. Use of this specific terminology when referring to these types of syntheses also implies the application of formal methods (Table 4.4 ).

Methods for syntheses without meta-analysis involve structured presentations of the data in any tables and plots. In comparison to narrative descriptions of each study, these are designed to more effectively and transparently show patterns and convey detailed information about the data; they also allow informal exploration of heterogeneity [ 178 ]. In addition, acceptable quantitative statistical methods (Table 4.4 ) are formally applied; however, it is important to recognize these methods have significant limitations for the interpretation of the effectiveness of an intervention [ 160 ]. Nevertheless, when meta-analysis is not possible, the application of these methods is less prone to bias compared with an unstructured narrative description of included studies [ 178 , 179 ].

Vote counting is commonly used in systematic reviews and involves a tally of studies reporting results that meet some threshold of importance applied by review authors. Until recently, it has not typically been identified as a method for synthesis without meta-analysis. Guidance on an acceptable vote counting method based on direction of effect is currently available [ 160 ] and should be used instead of narrative descriptions of such results (eg, “more than half the studies showed improvement”; “only a few studies reported adverse effects”; “7 out of 10 studies favored the intervention”). Unacceptable methods include vote counting by statistical significance or magnitude of effect or some subjective rule applied by the authors.

AMSTAR-2 and ROBIS standards do not explicitly address conduct of syntheses without meta-analysis, although AMSTAR-2 items 13 and 14 might be considered relevant. Guidance for the complete reporting of syntheses without meta-analysis for systematic reviews of interventions is available in the Synthesis without Meta-analysis (SWiM) guideline [ 180 ] and methodological guidance is available in the Cochrane Handbook [ 160 , 181 ].

Familiarity with AMSTAR-2 and ROBIS makes sense for authors of systematic reviews as these appraisal tools will be used to judge their work; however, training is necessary for authors to truly appreciate and apply methodological rigor. Moreover, judgment of the potential contribution of a systematic review to the current knowledge base goes beyond meeting the standards of AMSTAR-2 and ROBIS. These tools do not explicitly address some crucial concepts involved in the development of a systematic review; this further emphasizes the need for author training.

We recommend that systematic review authors incorporate specific practices or exercises when formulating a research question at the protocol stage, These should be designed to raise the review team’s awareness of how to prevent research and resource waste [ 84 , 130 ] and to stimulate careful contemplation of the scope of the review [ 30 ]. Authors’ training should also focus on justifiably choosing a formal method for the synthesis of quantitative and/or qualitative data from primary research; both types of data require specific expertise. For typical reviews that involve syntheses of quantitative data, statistical expertise is necessary, initially for decisions about appropriate methods, [ 160 , 161 ] and then to inform any meta-analyses [ 167 ] or other statistical methods applied [ 160 ].

Part 5. Rating overall certainty of evidence

Report of an overall certainty of evidence assessment in a systematic review is an important new reporting standard of the updated PRISMA 2020 guidelines [ 93 ]. Systematic review authors are well acquainted with assessing RoB in individual primary studies, but much less familiar with assessment of overall certainty across an entire body of evidence. Yet a reliable way to evaluate this broader concept is now recognized as a vital part of interpreting the evidence.

Historical systems for rating evidence are based on study design and usually involve hierarchical levels or classes of evidence that use numbers and/or letters to designate the level/class. These systems were endorsed by various EBM-related organizations. Professional societies and regulatory groups then widely adopted them, often with modifications for application to the available primary research base in specific clinical areas. In 2002, a report issued by the AHRQ identified 40 systems to rate quality of a body of evidence [ 182 ]. A critical appraisal of systems used by prominent health care organizations published in 2004 revealed limitations in sensibility, reproducibility, applicability to different questions, and usability to different end users [ 183 ]. Persistent use of hierarchical rating schemes to describe overall quality continues to complicate the interpretation of evidence. This is indicated by recent reports of poor interpretability of systematic review results by readers [ 184 – 186 ] and misleading interpretations of the evidence related to the “spin” systematic review authors may put on their conclusions [ 50 , 187 ].

Recognition of the shortcomings of hierarchical rating systems raised concerns that misleading clinical recommendations could result even if based on a rigorous systematic review. In addition, the number and variability of these systems were considered obstacles to quick and accurate interpretations of the evidence by clinicians, patients, and policymakers [ 183 ]. These issues contributed to the development of the GRADE approach. An international working group, that continues to actively evaluate and refine it, first introduced GRADE in 2004 [ 188 ]. Currently more than 110 organizations from 19 countries around the world have endorsed or are using GRADE [ 189 ].

GRADE approach to rating overall certainty

GRADE offers a consistent and sensible approach for two separate processes: rating the overall certainty of a body of evidence and the strength of recommendations. The former is the expected conclusion of a systematic review, while the latter is pertinent to the development of CPGs. As such, GRADE provides a mechanism to bridge the gap from evidence synthesis to application of the evidence for informed clinical decision-making [ 27 , 190 ]. We briefly examine the GRADE approach but only as it applies to rating overall certainty of evidence in systematic reviews.

In GRADE, use of “certainty” of a body of evidence is preferred over the term “quality.” [ 191 ] Certainty refers to the level of confidence systematic review authors have that, for each outcome, an effect estimate represents the true effect. The GRADE approach to rating confidence in estimates begins with identifying the study type (RCT or NRSI) and then systematically considers criteria to rate the certainty of evidence up or down (Table 5.1 ).

GRADE criteria for rating certainty of evidence

[ ]
Risk of bias [ ]Large magnitude of effect
Imprecision [ ]Dose–response gradient
Inconsistency [ ]All residual confounding would decrease magnitude of effect (in situations with an effect)
Indirectness [ ]
Publication bias [ ]

a Applies to randomized studies

b Applies to non-randomized studies

This process results in assignment of one of the four GRADE certainty ratings to each outcome; these are clearly conveyed with the use of basic interpretation symbols (Table 5.2 ) [ 192 ]. Notably, when multiple outcomes are reported in a systematic review, each outcome is assigned a unique certainty rating; thus different levels of certainty may exist in the body of evidence being examined.

GRADE certainty ratings and their interpretation symbols a

 ⊕  ⊕  ⊕  ⊕ High: We are very confident that the true effect lies close to that of the estimate of the effect
 ⊕  ⊕  ⊕ Moderate: We are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different
 ⊕  ⊕ Low: Our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect
 ⊕ Very low: We have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect

a From the GRADE Handbook [ 192 ]

GRADE’s developers acknowledge some subjectivity is involved in this process [ 193 ]. In addition, they emphasize that both the criteria for rating evidence up and down (Table 5.1 ) as well as the four overall certainty ratings (Table 5.2 ) reflect a continuum as opposed to discrete categories [ 194 ]. Consequently, deciding whether a study falls above or below the threshold for rating up or down may not be straightforward, and preliminary overall certainty ratings may be intermediate (eg, between low and moderate). Thus, the proper application of GRADE requires systematic review authors to take an overall view of the body of evidence and explicitly describe the rationale for their final ratings.

Advantages of GRADE

Outcomes important to the individuals who experience the problem of interest maintain a prominent role throughout the GRADE process [ 191 ]. These outcomes must inform the research questions (eg, PICO [population, intervention, comparator, outcome]) that are specified a priori in a systematic review protocol. Evidence for these outcomes is then investigated and each critical or important outcome is ultimately assigned a certainty of evidence as the end point of the review. Notably, limitations of the included studies have an impact at the outcome level. Ultimately, the certainty ratings for each outcome reported in a systematic review are considered by guideline panels. They use a different process to formulate recommendations that involves assessment of the evidence across outcomes [ 201 ]. It is beyond our scope to describe the GRADE process for formulating recommendations; however, it is critical to understand how these two outcome-centric concepts of certainty of evidence in the GRADE framework are related and distinguished. An in-depth illustration using examples from recently published evidence syntheses and CPGs is provided in Additional File 5 A (Table AF5A-1).

The GRADE approach is applicable irrespective of whether the certainty of the primary research evidence is high or very low; in some circumstances, indirect evidence of higher certainty may be considered if direct evidence is unavailable or of low certainty [ 27 ]. In fact, most interventions and outcomes in medicine have low or very low certainty of evidence based on GRADE and there seems to be no major improvement over time [ 202 , 203 ]. This is still a very important (even if sobering) realization for calibrating our understanding of medical evidence. A major appeal of the GRADE approach is that it offers a common framework that enables authors of evidence syntheses to make complex judgments about evidence certainty and to convey these with unambiguous terminology. This prevents some common mistakes made by review authors, including overstating results (or under-reporting harms) [ 187 ] and making recommendations for treatment. This is illustrated in Table AF5A-2 (Additional File 5 A), which compares the concluding statements made about overall certainty in a systematic review with and without application of the GRADE approach.

Theoretically, application of GRADE should improve consistency of judgments about certainty of evidence, both between authors and across systematic reviews. In one empirical evaluation conducted by the GRADE Working Group, interrater reliability of two individual raters assessing certainty of the evidence for a specific outcome increased from ~ 0.3 without using GRADE to ~ 0.7 by using GRADE [ 204 ]. However, others report variable agreement among those experienced in GRADE assessments of evidence certainty [ 190 ]. Like any other tool, GRADE requires training in order to be properly applied. The intricacies of the GRADE approach and the necessary subjectivity involved suggest that improving agreement may require strict rules for its application; alternatively, use of general guidance and consensus among review authors may result in less consistency but provide important information for the end user [ 190 ].

GRADE caveats

Simply invoking “the GRADE approach” does not automatically ensure GRADE methods were employed by authors of a systematic review (or developers of a CPG). Table 5.3 lists the criteria the GRADE working group has established for this purpose. These criteria highlight the specific terminology and methods that apply to rating the certainty of evidence for outcomes reported in a systematic review [ 191 ], which is different from rating overall certainty across outcomes considered in the formulation of recommendations [ 205 ]. Modifications of standard GRADE methods and terminology are discouraged as these may detract from GRADE’s objectives to minimize conceptual confusion and maximize clear communication [ 206 ].

Criteria for using GRADE in a systematic review a

1. The certainty in the evidence (also known as quality of evidence or confidence in the estimates) should be defined consistently with the definitions used by the GRADE Working Group.
2. Explicit consideration should be given to each of the GRADE domains for assessing the certainty in the evidence (although different terminology may be used).
3. The overall certainty in the evidence should be assessed for each important outcome using four or three categories (such as high, moderate, low and/or very low) and definitions for each category that are consistent with the definitions used by the GRADE Working Group.
4. Evidence summaries … should be used as the basis for judgments about the certainty in the evidence.

a Adapted from the GRADE working group [ 206 ]; this list does not contain the additional criteria that apply to the development of a clinical practice guideline

Nevertheless, GRADE is prone to misapplications [ 207 , 208 ], which can distort a systematic review’s conclusions about the certainty of evidence. Systematic review authors without proper GRADE training are likely to misinterpret the terms “quality” and “grade” and to misunderstand the constructs assessed by GRADE versus other appraisal tools. For example, review authors may reference the standard GRADE certainty ratings (Table 5.2 ) to describe evidence for their outcome(s) of interest. However, these ratings are invalidated if authors omit or inadequately perform RoB evaluations of each included primary study. Such deficiencies in RoB assessments are unacceptable but not uncommon, as reported in methodological studies of systematic reviews and overviews [ 104 , 186 , 209 , 210 ]. GRADE ratings are also invalidated if review authors do not formally address and report on the other criteria (Table 5.1 ) necessary for a GRADE certainty rating.

Other caveats pertain to application of a GRADE certainty of evidence rating in various types of evidence syntheses. Current adaptations of GRADE are described in Additional File 5 B and included on Table 6.3 , which is introduced in the next section.

Concise Guide to best practices for evidence syntheses, version 1.0 a

Cochrane , JBICochrane, JBICochraneCochrane, JBIJBIJBIJBICochrane, JBIJBI
 ProtocolPRISMA-P [ ]PRISMA-PPRISMA-PPRISMA-PPRISMA-PPRISMA-PPRISMA-PPRISMA-PPRISMA-P
 Systematic reviewPRISMA 2020 [ ]PRISMA-DTA [ ]PRISMA 2020

eMERGe [ ]

ENTREQ [ ]

PRISMA 2020PRISMA 2020PRISMA 2020PRIOR [ ]PRISMA-ScR [ ]
 Synthesis without MASWiM [ ]PRISMA-DTA [ ]SWiM eMERGe [ ] ENTREQ [ ] SWiM SWiM SWiM PRIOR [ ]

For RCTs: Cochrane RoB2 [ ]

For NRSI:

ROBINS-I [ ]

Other primary research

QUADAS-2[ ]

Factor review QUIPS [ ]

Model review PROBAST [ ]

CASP qualitative checklist [ ]

JBI Critical Appraisal Checklist [ ]

JBI checklist for studies reporting prevalence data [ ]

For NRSI: ROBINS-I [ ]

Other primary research

COSMIN RoB Checklist [ ]AMSTAR-2 [ ] or ROBIS [ ]Not required
GRADE [ ]GRADE adaptation GRADE adaptation

CERQual [ ]

ConQual [ ]

GRADE adaptation Risk factors GRADE adaptation

GRADE (for intervention reviews)

Risk factors

Not applicable

AMSTAR A MeaSurement Tool to Assess Systematic Reviews, CASP Critical Appraisal Skills Programme, CERQual Confidence in the Evidence from Reviews of Qualitative research, ConQual Establishing Confidence in the output of Qualitative research synthesis, COSMIN COnsensus-based Standards for the selection of health Measurement Instruments, DTA diagnostic test accuracy, eMERGe meta-ethnography reporting guidance, ENTREQ enhancing transparency in reporting the synthesis of qualitative research, GRADE Grading of Recommendations Assessment, Development and Evaluation, MA meta-analysis, NRSI non-randomized studies of interventions, P protocol, PRIOR Preferred Reporting Items for Overviews of Reviews, PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses, PROBAST Prediction model Risk Of Bias ASsessment Tool, QUADAS quality assessment of studies of diagnostic accuracy included in systematic reviews, QUIPS Quality In Prognosis Studies, RCT randomized controlled trial, RoB risk of bias, ROBINS-I Risk Of Bias In Non-randomised Studies of Interventions, ROBIS Risk of Bias in Systematic Reviews, ScR scoping review, SWiM systematic review without meta-analysis

a Superscript numbers represent citations provided in the main reference list. Additional File 6 lists links to available online resources for the methods and tools included in the Concise Guide

b The MECIR manual [ 30 ] provides Cochrane’s specific standards for both reporting and conduct of intervention systematic reviews and protocols

c Editorial and peer reviewers can evaluate completeness of reporting in submitted manuscripts using these tools. Authors may be required to submit a self-reported checklist for the applicable tools

d The decision flowchart described by Flemming and colleagues [ 223 ] is recommended for guidance on how to choose the best approach to reporting for qualitative reviews

e SWiM was developed for intervention studies reporting quantitative data. However, if there is not a more directly relevant reporting guideline, SWiM may prompt reviewers to consider the important details to report. (Personal Communication via email, Mhairi Campbell, 14 Dec 2022)

f JBI recommends their own tools for the critical appraisal of various quantitative primary study designs included in systematic reviews of intervention effectiveness, prevalence and incidence, and etiology and risk as well as for the critical appraisal of systematic reviews included in umbrella reviews. However, except for the JBI Checklists for studies reporting prevalence data and qualitative research, the development, validity, and reliability of these tools are not well documented

g Studies that are not RCTs or NRSI require tools developed specifically to evaluate their design features. Examples include single case experimental design [ 155 , 156 ] and case reports and series [ 82 ]

h The evaluation of methodological quality of studies included in a synthesis of qualitative research is debatable [ 224 ]. Authors may select a tool appropriate for the type of qualitative synthesis methodology employed. The CASP Qualitative Checklist [ 218 ] is an example of a published, commonly used tool that focuses on assessment of the methodological strengths and limitations of qualitative studies. The JBI Critical Appraisal Checklist for Qualitative Research [ 219 ] is recommended for reviews using a meta-aggregative approach

i Consider including risk of bias assessment of included studies if this information is relevant to the research question; however, scoping reviews do not include an assessment of the overall certainty of a body of evidence

j Guidance available from the GRADE working group [ 225 , 226 ]; also recommend consultation with the Cochrane diagnostic methods group

k Guidance available from the GRADE working group [ 227 ]; also recommend consultation with Cochrane prognostic methods group

l Used for syntheses in reviews with a meta-aggregative approach [ 224 ]

m Chapter 5 in the JBI Manual offers guidance on how to adapt GRADE to prevalence and incidence reviews [ 69 ]

n Janiaud and colleagues suggest criteria for evaluating evidence certainty for meta-analyses of non-randomized studies evaluating risk factors [ 228 ]

o The COSMIN user manual provides details on how to apply GRADE in systematic reviews of measurement properties [ 229 ]

The expected culmination of a systematic review should be a rating of overall certainty of a body of evidence for each outcome reported. The GRADE approach is recommended for making these judgments for outcomes reported in systematic reviews of interventions and can be adapted for other types of reviews. This represents the initial step in the process of making recommendations based on evidence syntheses. Peer reviewers should ensure authors meet the minimal criteria for supporting the GRADE approach when reviewing any evidence synthesis that reports certainty ratings derived using GRADE. Authors and peer reviewers of evidence syntheses unfamiliar with GRADE are encouraged to seek formal training and take advantage of the resources available on the GRADE website [ 211 , 212 ].

Part 6. Concise Guide to best practices

Accumulating data in recent years suggest that many evidence syntheses (with or without meta-analysis) are not reliable. This relates in part to the fact that their authors, who are often clinicians, can be overwhelmed by the plethora of ways to evaluate evidence. They tend to resort to familiar but often inadequate, inappropriate, or obsolete methods and tools and, as a result, produce unreliable reviews. These manuscripts may not be recognized as such by peer reviewers and journal editors who may disregard current standards. When such a systematic review is published or included in a CPG, clinicians and stakeholders tend to believe that it is trustworthy. A vicious cycle in which inadequate methodology is rewarded and potentially misleading conclusions are accepted is thus supported. There is no quick or easy way to break this cycle; however, increasing awareness of best practices among all these stakeholder groups, who often have minimal (if any) training in methodology, may begin to mitigate it. This is the rationale for inclusion of Parts 2 through 5 in this guidance document. These sections present core concepts and important methodological developments that inform current standards and recommendations. We conclude by taking a direct and practical approach.

Inconsistent and imprecise terminology used in the context of development and evaluation of evidence syntheses is problematic for authors, peer reviewers and editors, and may lead to the application of inappropriate methods and tools. In response, we endorse use of the basic terms (Table 6.1 ) defined in the PRISMA 2020 statement [ 93 ]. In addition, we have identified several problematic expressions and nomenclature. In Table 6.2 , we compile suggestions for preferred terms less likely to be misinterpreted.

Terms relevant to the reporting of health care–related evidence syntheses a

A review that uses explicit, systematic methods to collate and synthesize findings of studies that address a clearly formulated question.
The combination of quantitative results of two or more studies. This encompasses meta-analysis of effect estimates and other methods, such as combining values, calculating the range and distribution of observed effects, and vote counting based on the direction of effect.
A statistical technique used to synthesize results when study effect estimates and their variances are available, yielding a quantitative summary of results.
An event or measurement collected for participants in a study (such as quality of life, mortality).
The combination of a point estimate (such as a mean difference, risk ratio or proportion) and a measure of its precision (such as a confidence/credible interval) for a particular outcome.
A document (paper or electronic) supplying information about a particular study. It could be a journal article, preprint, conference abstract, study register entry, clinical study report, dissertation, unpublished manuscript, government report, or any other document providing relevant information.
The title or abstract (or both) of a report indexed in a database or website (such as a title or abstract for an article indexed in Medline). Records that refer to the same report (such as the same journal article) are “duplicates”; however, records that refer to reports that are merely similar (such as a similar abstract submitted to two different conferences) should be considered unique.
An investigation, such as a clinical trial, that includes a defined group of participants and one or more interventions and outcomes. A “study” might have multiple reports. For example, reports could include the protocol, statistical analysis plan, baseline characteristics, results for the primary outcome, results for harms, results for secondary outcomes, and results for additional mediator and moderator analyses.

a Reproduced from Page and colleagues [ 93 ]

Terminology suggestions for health care–related evidence syntheses

PreferredPotentially problematic

Evidence synthesis with meta-analysis

Systematic review with meta-analysis

Meta-analysis
Overview or umbrella review

Systematic review of systematic reviews

Review of reviews

Meta-review

RandomizedExperimental
Non-randomizedObservational
Single case experimental design

Single-subject research

N-of-1 design

Case report or case seriesDescriptive study
Methodological qualityQuality
Certainty of evidence

Quality of evidence

Grade of evidence

Level of evidence

Strength of evidence

Qualitative systematic reviewQualitative synthesis
Synthesis of qualitative data Qualitative synthesis
Synthesis without meta-analysis

Narrative synthesis , narrative summary

Qualitative synthesis

Descriptive synthesis, descriptive summary

a For example, meta-aggregation, meta-ethnography, critical interpretative synthesis, realist synthesis

b This term may best apply to the synthesis in a mixed methods systematic review in which data from different types of evidence (eg, qualitative, quantitative, economic) are summarized [ 64 ]

We also propose a Concise Guide (Table 6.3 ) that summarizes the methods and tools recommended for the development and evaluation of nine types of evidence syntheses. Suggestions for specific tools are based on the rigor of their development as well as the availability of detailed guidance from their developers to ensure their proper application. The formatting of the Concise Guide addresses a well-known source of confusion by clearly distinguishing the underlying methodological constructs that these tools were designed to assess. Important clarifications and explanations follow in the guide’s footnotes; associated websites, if available, are listed in Additional File 6 .

To encourage uptake of best practices, journal editors may consider adopting or adapting the Concise Guide in their instructions to authors and peer reviewers of evidence syntheses. Given the evolving nature of evidence synthesis methodology, the suggested methods and tools are likely to require regular updates. Authors of evidence syntheses should monitor the literature to ensure they are employing current methods and tools. Some types of evidence syntheses (eg, rapid, economic, methodological) are not included in the Concise Guide; for these, authors are advised to obtain recommendations for acceptable methods by consulting with their target journal.

We encourage the appropriate and informed use of the methods and tools discussed throughout this commentary and summarized in the Concise Guide (Table 6.3 ). However, we caution against their application in a perfunctory or superficial fashion. This is a common pitfall among authors of evidence syntheses, especially as the standards of such tools become associated with acceptance of a manuscript by a journal. Consequently, published evidence syntheses may show improved adherence to the requirements of these tools without necessarily making genuine improvements in their performance.

In line with our main objective, the suggested tools in the Concise Guide address the reliability of evidence syntheses; however, we recognize that the utility of systematic reviews is an equally important concern. An unbiased and thoroughly reported evidence synthesis may still not be highly informative if the evidence itself that is summarized is sparse, weak and/or biased [ 24 ]. Many intervention systematic reviews, including those developed by Cochrane [ 203 ] and those applying GRADE [ 202 ], ultimately find no evidence, or find the evidence to be inconclusive (eg, “weak,” “mixed,” or of “low certainty”). This often reflects the primary research base; however, it is important to know what is known (or not known) about a topic when considering an intervention for patients and discussing treatment options with them.

Alternatively, the frequency of “empty” and inconclusive reviews published in the medical literature may relate to limitations of conventional methods that focus on hypothesis testing; these have emphasized the importance of statistical significance in primary research and effect sizes from aggregate meta-analyses [ 183 ]. It is becoming increasingly apparent that this approach may not be appropriate for all topics [ 130 ]. Development of the GRADE approach has facilitated a better understanding of significant factors (beyond effect size) that contribute to the overall certainty of evidence. Other notable responses include the development of integrative synthesis methods for the evaluation of complex interventions [ 230 , 231 ], the incorporation of crowdsourcing and machine learning into systematic review workflows (eg the Cochrane Evidence Pipeline) [ 2 ], the shift in paradigm to living systemic review and NMA platforms [ 232 , 233 ] and the proposal of a new evidence ecosystem that fosters bidirectional collaborations and interactions among a global network of evidence synthesis stakeholders [ 234 ]. These evolutions in data sources and methods may ultimately make evidence syntheses more streamlined, less duplicative, and more importantly, they may be more useful for timely policy and clinical decision-making; however, that will only be the case if they are rigorously reported and conducted.

We look forward to others’ ideas and proposals for the advancement of methods for evidence syntheses. For now, we encourage dissemination and uptake of the currently accepted best tools and practices for their development and evaluation; at the same time, we stress that uptake of appraisal tools, checklists, and software programs cannot substitute for proper education in the methodology of evidence syntheses and meta-analysis. Authors, peer reviewers, and editors must strive to make accurate and reliable contributions to the present evidence knowledge base; online alerts, upcoming technology, and accessible education may make this more feasible than ever before. Our intention is to improve the trustworthiness of evidence syntheses across disciplines, topics, and types of evidence syntheses. All of us must continue to study, teach, and act cooperatively for that to happen.

Acknowledgements

Michelle Oakman Hayes for her assistance with the graphics, Mike Clarke for his willingness to answer our seemingly arbitrary questions, and Bernard Dan for his encouragement of this project.

Authors’ contributions

All authors participated in the development of the ideas, writing, and review of this manuscript. The author(s) read and approved the final manuscript.

The work of John Ioannidis has been supported by an unrestricted gift from Sue and Bob O’Donnell to Stanford University.

Declarations

The authors declare no competing interests.

This article has been published simultaneously in BMC Systematic Reviews, Acta Anaesthesiologica Scandinavica, BMC Infectious Diseases, British Journal of Pharmacology, JBI Evidence Synthesis, the Journal of Bone and Joint Surgery Reviews , and the Journal of Pediatric Rehabilitation Medicine .

Publisher’ s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Cochrane Methods Rapid Reviews

Welcome to the Cochrane Rapid Reviews Methods Group (RRMG)  website. The RRMG is  one of 17  Cochrane Method Groups  world-wide comprised of individuals with an interest and expertise in the science of systematic reviews.

While the concept of rapid evidence synthesis, or rapid review (RR) , is not novel, it remains a poorly understood and as yet ill-defined set of diverse methodologies supported by a paucity of published, available scientific literature. The speed with which RRs are gaining prominence and are being incorporated into urgent decision-making underscores the need to explore their characteristics and use further. While rapid review producers must answer the time-sensitive needs of the health decision makers they serve, they must simultaneously ensure that the scientific imperative of methodological rigor is satisfied. In order to a dequately address this inherent tension, a need for methodological research and standard development has been identified.   For these reasons, we have established the  Cochrane Rapid Reviews Methods Group (RRMG)  to better inform ‘rapid review’ methodology. 

Scope of the RRMG  will be serve to inform rapid reviews in general, both within the Cochrane Collaboration and beyond. Including a scope that is beyond the current purview of Cochrane’s work is an opportunity for Cochrane to position itself as a leader in rapid review methodology, just as it has been influential for systematic reviews, in general. 

Presently, the RRMG is comprised of seven co-convenors and two associate convenors from Canada, the United States, Ireland and Austria, with virtual co-administration by the Ottawa Methods Centre based at the   Ottawa Hospital Research Institute (OHRI) and Cochrane Austria .

Learn more about us...

cochrane systematic literature review

Jump to navigation

Home

Cochrane Training

Chapter 3: defining the criteria for including studies and how they will be grouped for the synthesis.

Joanne E McKenzie, Sue E Brennan, Rebecca E Ryan, Hilary J Thomson, Renea V Johnston, James Thomas

Key Points:

  • The scope of a review is defined by the types of population (participants), types of interventions (and comparisons), and the types of outcomes that are of interest. The acronym PICO (population, interventions, comparators and outcomes) helps to serve as a reminder of these.
  • The population, intervention and comparison components of the question, with the additional specification of types of study that will be included, form the basis of the pre-specified eligibility criteria for the review. It is rare to use outcomes as eligibility criteria: studies should be included irrespective of whether they report outcome data, but may legitimately be excluded if they do not measure outcomes of interest, or if they explicitly aim to prevent a particular outcome.
  • Cochrane Reviews should include all outcomes that are likely to be meaningful and not include trivial outcomes. Critical and important outcomes should be limited in number and include adverse as well as beneficial outcomes.
  • Review authors should plan at the protocol stage how the different populations, interventions, outcomes and study designs within the scope of the review will be grouped for analysis.

Cite this chapter as: McKenzie JE, Brennan SE, Ryan RE, Thomson HJ, Johnston RV, Thomas J. Chapter 3: Defining the criteria for including studies and how they will be grouped for the synthesis [last updated August 2023]. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.5. Cochrane, 2024. Available from www.training.cochrane.org/handbook .

3.1 Introduction

One of the features that distinguishes a systematic review from a narrative review is that systematic review authors should pre-specify criteria for including and excluding studies in the review (eligibility criteria, see MECIR Box 3.2.a ).

When developing the protocol, one of the first steps is to determine the elements of the review question (including the population, intervention(s), comparator(s) and outcomes, or PICO elements) and how the intervention, in the specified population, produces the expected outcomes (see Chapter 2, Section 2.5.1 and Chapter 17, Section 17.2.1 ). Eligibility criteria are based on the PICO elements of the review question plus a specification of the types of studies that have addressed these questions. The population, interventions and comparators in the review question usually translate directly into eligibility criteria for the review, though this is not always a straightforward process and requires a thoughtful approach, as this chapter shows. Outcomes usually are not part of the criteria for including studies, and a Cochrane Review would typically seek all sufficiently rigorous studies (most commonly randomized trials) of a particular comparison of interventions in a particular population of participants, irrespective of the outcomes measured or reported. It should be noted that some reviews do legitimately restrict eligibility to specific outcomes. For example, the same intervention may be studied in the same population for different purposes; or a review may specifically address the adverse effects of an intervention used for several conditions (see Chapter 19 ).

Eligibility criteria do not exist in isolation, but should be specified with the synthesis of the studies they describe in mind. This will involve making plans for how to group variants of the PICO elements for synthesis. This chapter describes the processes by which the structure of the synthesis can be mapped out at the beginning of the review, and the interplay between the review question, considerations for the analysis and their operationalization in terms of eligibility criteria. Decisions about which studies to include (and exclude), and how they will be combined in the review’s synthesis, should be documented and justified in the review protocol.

A distinction between three different stages in the review at which the PICO construct might be used is helpful for understanding the decisions that need to be made. In Chapter 2, Section 2.3 , we introduced the ideas of a review PICO (on which eligibility of studies is based), the PICO for each synthesis (defining the question that each specific synthesis aims to answer) and the PICO of the included studies (what was actually investigated in the included studies). In this chapter, we focus on the review PICO and the PICO for each synthesis as a basis for specifying which studies should be included in the review and planning its syntheses. These PICOs should relate clearly and directly to the questions or hypotheses that are posed when the review is formulated (see Chapter 2 ) and will involve specifying the population in question, and a set of comparisons between the intervention groups.

An integral part of the process of setting up the review is to specify which characteristics of the interventions (e.g. individual compounds of a drug), populations (e.g. acute and chronic conditions), outcomes (e.g. different depression measurement scales) and study designs, will be grouped together. Such decisions should be made independent of knowing which studies will be included and the methods of synthesis that will be used (e.g. meta-analysis). There may be a need to modify the comparisons and even add new ones at the review stage in light of the data that are collected. For example, important variations in the intervention may be discovered only after data are collected, or modifying the comparison may facilitate the possibility of synthesis when only one or few studies meet the comparison PICO. Planning for the latter scenario at the protocol stage may lead to less post-hoc decision making ( Chapter 2, Section 2.5.3 ) and, of course, any changes made during the conduct of the review should be recorded and documented in the final report.

3.2 Articulating the review and comparison PICO

3.2.1 defining types of participants: which people and populations.

The criteria for considering types of people included in studies in a review should be sufficiently broad to encompass the likely diversity of studies and the likely scenarios in which the interventions will be used, but sufficiently narrow to ensure that a meaningful answer can be obtained when studies are considered together; they should be specified in advance (see MECIR Box 3.2.a ). As discussed in Chapter 2, Section 2.3.1 , the degree of breadth will vary, depending on the question being asked and the analytical approach to be employed. A range of evidence may inform the choice of population characteristics to examine, including theoretical considerations, evidence from other interventions that have a similar mechanism of action, and in vitro or animal studies. Consideration should be given to whether the population characteristic is at the level of the participant (e.g. age, severity of disease) or the study (e.g. care setting, geographical location), since this has implications for grouping studies and for the method of synthesis ( Chapter 10, Section 10.11.5 ). It is often helpful to consider the types of people that are of interest in three steps.

MECIR Box 3.2.a Relevant expectations for conduct of intervention reviews

Predefining unambiguous criteria for participants ( )

Predefined, unambiguous eligibility criteria are a fundamental prerequisite for a systematic review. The criteria for considering types of people included in studies in a review should be sufficiently broad to encompass the likely diversity of studies, but sufficiently narrow to ensure that a meaningful answer can be obtained when studies are considered in aggregate. Considerations when specifying participants include setting, diagnosis or definition of condition and demographic factors. Any restrictions to study populations must be based on a sound rationale, since it is important that Cochrane Reviews are widely relevant.

Predefining a strategy for studies with a subset of eligible participants ( )

Sometimes a study includes some ‘eligible’ participants and some ‘ineligible’ participants, for example when an age cut-off is used in the review’s eligibility criteria. If data from the eligible participants cannot be retrieved, a mechanism for dealing with this situation should be pre-specified.

First, the diseases or conditions of interest should be defined using explicit criteria for establishing their presence (or absence). Criteria that will force the unnecessary exclusion of studies should be avoided. For example, diagnostic criteria that were developed more recently – which may be viewed as the current gold standard for diagnosing the condition of interest – will not have been used in earlier studies. Expensive or recent diagnostic tests may not be available in many countries or settings, and time-consuming tests may not be practical in routine healthcare settings.

Second, the broad population and setting of interest should be defined . This involves deciding whether a specific population group is within scope, determined by factors such as age, sex, race, educational status or the presence of a particular condition such as angina or shortness of breath. Interest may focus on a particular setting such as a community, hospital, nursing home, chronic care institution, or outpatient setting. Box 3.2.a outlines some factors to consider when developing population criteria.

Whichever criteria are used for defining the population and setting of interest, it is common to encounter studies that only partially overlap with the review’s population. For example, in a review focusing on children, a cut-point of less than 16 years might be desirable, but studies may be identified with participants aged from 12 to 18. Unless the study reports separate data from the eligible section of the population (in which case data from the eligible participants can be included in the review), review authors will need a strategy for dealing with these studies (see MECIR Box 3.2.a ). This will involve balancing concerns about reduced applicability by including participants who do not meet the eligibility criteria, against the loss of data when studies are excluded. Arbitrary rules (such as including a study if more than 80% of the participants are under 16) will not be practical if detailed information is not available from the study. A less stringent rule, such as ‘the majority of participants are under 16’ may be sufficient. Although there is a risk of review authors’ biases affecting post-hoc inclusion decisions (which is why many authors endeavour to pre-specify these rules), this may be outweighed by a common-sense strategy in which eligibility decisions keep faith with the objectives of the review rather than with arbitrary rules. Difficult decisions should be documented in the review, checked with the advisory group (if available, see Chapter 1 ), and sensitivity analyses can assess the impact of these decisions on the review’s findings (see Chapter 10, Section 10.14 and MECIR Box 3.2.b ).

Box 3.2.a Factors to consider when developing criteria for ‘Types of participants’

MECIR Box 3.2.b Relevant expectations for conduct of intervention reviews

Changing eligibility criteria ( )

Following pre-specified eligibility criteria is a fundamental attribute of a systematic review. However, unanticipated issues may arise. Review authors should make sensible post-hoc decisions about exclusion of studies, and these should be documented in the review, possibly accompanied by sensitivity analyses. Changes to the protocol must not be made on the basis of the findings of the studies or the synthesis, as this can introduce bias.

Third, there should be consideration of whether there are population characteristics that might be expected to modify the size of the intervention effects (e.g. different severities of heart failure). Identifying subpopulations may be important for implementation of the intervention. If relevant subpopulations are identified, two courses of action are possible: limiting the scope of the review to exclude certain subpopulations; or maintaining the breadth of the review and addressing subpopulations in the analysis.

Restricting the review with respect to specific population characteristics or settings should be based on a sound rationale. It is important that Cochrane Reviews are globally relevant, so the rationale for the exclusion of studies based on population characteristics should be justified. For example, focusing a review of the effectiveness of mammographic screening on women between 40 and 50 years old may be justified based on biological plausibility, previously published systematic reviews and existing controversy. On the other hand, focusing a review on a particular subgroup of people on the basis of their age, sex or ethnicity simply because of personal interests, when there is no underlying biologic or sociological justification for doing so, should be avoided, as these reviews will be less useful to decision makers and readers of the review.

Maintaining the breadth of the review may be best when it is uncertain whether there are important differences in effects among various subgroups of people, since this allows investigation of these differences (see Chapter 10, Section 10.11.5 ). Review authors may combine the results from different subpopulations in the same synthesis, examining whether a given subdivision explains variation (heterogeneity) among the intervention effects. Alternatively, the results may be synthesized in separate comparisons representing different subpopulations. Splitting by subpopulation risks there being too few studies to yield a useful synthesis (see Table 3.2.a and Chapter 2, Section 2.3.2 ). Consideration needs to be given to the subgroup analysis method, particularly for population characteristics measured at the participant level (see Chapter 10 and Chapter 26 , Fisher et al 2017). All subgroup analyses should ideally be planned a priori and stated as a secondary objective in the protocol, and not driven by the availability of data.

In practice, it may be difficult to assign included studies to defined subpopulations because of missing information about the population characteristic, variability in how the population characteristic is measured across studies (e.g. variation in the method used to define the severity of heart failure), or because the study does not wholly fall within (or report the results separately by) the defined subpopulation. The latter issue mainly applies for participant characteristics but can also arise for settings or geographic locations where these vary within studies. Review authors should consider planning for these scenarios (see example reviews Hetrick et al 2012, Safi et al 2017; Table 3.2.b , column 3).

Table 3.2.a Examples of population attributes and characteristics

Intended recipient of intervention

Patient, carer, healthcare provider (general practitioners, nurses, allied health professionals), health system, policy maker, community

In a review of e-learning programmes for health professionals, a subgroup analysis was planned to examine if the effects were modified by the (doctors, nurses or physiotherapists). The authors hypothesized that e-learning programmes for doctors would be more effective than for other health professionals, but did not provide a rationale (Vaona et al 2018).

Disease/condition (to be treated or prevented)

Type and severity of a condition

In a review of platelet-rich therapies for musculoskeletal soft tissue injuries, a subgroup analysis was undertaken to examine if the effects of platelet-rich therapies were modified by the (e.g. rotator cuff tear, anterior cruciate ligament reconstruction, chronic Achilles tendinopathy) (Moraes et al 2014).

In planning a review of beta-blockers for heart failure, subgroup analyses were specified to examine if the effects of beta-blockers are modified by the (e.g. idiopathic dilated cardiomyopathy, ischaemic heart disease, valvular heart disease, hypertension) and the (‘reduced left ventricular ejection fraction (LVEF)’ ≤ 40%, ‘mid-range LVEF’ > 40% and < 50%, ‘preserved LVEF’ ≥ 50%, mixed, not specified). Studies have shown that patient characteristics and comorbidities differ by heart failure severity, and that therapies have been shown to reduce morbidity in ‘reduced LVEF’ patients, but the benefits in the other groups are uncertain (Safi et al 2017).

Participant characteristics

Age (neonate, child, adolescent, adult, older adult)

Race/ethnicity

Sex/gender

PROGRESS-Plus equity characteristics (e.g. place of residence, socio-economic status, education) (O’Neill et al 2014)

In a review of newer-generation antidepressants for depressive disorders in children and adolescents, a subgroup analysis was undertaken to examine if the effects of the antidepressants were modified by . The rationale was based on the findings of another review that suggested that children and adolescents may respond differently to antidepressants. The age groups were defined as ‘children’ (aged approximately 6 to 12 years), ‘adolescents’ (aged approximately 13 to 18 years), and ‘children and adolescents’ (when the study included both children and adolescents, and results could not be obtained separately by these subpopulations) (Hetrick et al 2012).

Setting

Setting of care (primary care, hospital, community)

Rurality (urban, rural, remote)

Socio-economic setting (low and middle-income countries, high-income countries)

Hospital ward (e.g. intensive care unit, general medical ward, outpatient)

In a review of hip protectors for preventing hip fractures in older people, separate comparisons were specified based on (institutional care or community-dwelling) for the critical outcome of hip fracture (Santesso et al 2014).

3.2.2 Defining interventions and how they will be grouped

In some reviews, predefining the intervention ( MECIR Box 3.2.c ) may be straightforward. For example, in a review of the effect of a given anticoagulant on deep vein thrombosis, the intervention can be defined precisely. A more complicated definition might be required for a multi-component intervention composed of dietary advice, training and support groups to reduce rates of obesity in a given population.

The inherent complexity present when defining an intervention often comes to light when considering how it is thought to achieve its intended effect and whether the effect is likely to differ when variants of the intervention are used. In the first example, the anticoagulant warfarin is thought to reduce blood clots by blocking an enzyme that depends on vitamin K to generate clotting factors. In the second, the behavioural intervention is thought to increase individuals’ self-efficacy in their ability to prepare healthy food. In both examples, we cannot assume that all forms of the intervention will work in the same way. When defining drug interventions, such as anticoagulants, factors such as the drug preparation, route of administration, dose, duration, and frequency should be considered. For multi-component interventions (such as interventions to reduce rates of obesity), the common or core features of the interventions must be defined, so that the review authors can clearly differentiate them from other interventions not included in the review.

MECIR Box 3.2.c Relevant expectations for conduct of intervention reviews

Predefining unambiguous criteria for interventions and comparators ( )

Predefined, unambiguous eligibility criteria are a fundamental prerequisite for a systematic review. Specification of comparator interventions requires particular clarity: are the experimental interventions to be compared with an inactive control intervention (e.g. placebo, no treatment, standard care, or a waiting list control), or with an active control intervention (e.g. a different variant of the same intervention, a different drug, a different kind of therapy)? Any restrictions on interventions and comparators, for example, regarding delivery, dose, duration, intensity, co-interventions and features of complex interventions should also be predefined and explained.

In general, it is useful to consider exactly what is delivered, who delivers it, how it is delivered, where it is delivered, when and how much is delivered, and whether the intervention can be adapted or tailored , and to consider this for each type of intervention included in the review (see the TIDieR checklist (Hoffmann et al 2014)). As argued in Chapter 17 , separating interventions into ‘simple’ and ‘complex’ is a false dichotomy; all interventions can be complex in some ways. The critical issue for review authors is to identify the most important factors to be considered in a specific review. Box 3.2.b outlines some factors to consider when developing broad criteria for the ‘Types of interventions’ (and comparisons).

Box 3.2.b Factors to consider when developing criteria for ‘Types of interventions’

Once interventions eligible for the review have been broadly defined, decisions should be made about how variants of the intervention will be handled in the synthesis. Differences in intervention characteristics across studies occur in all reviews. If these reflect minor differences in the form of the intervention used in practice (such as small differences in the duration or content of brief alcohol counselling interventions), then an overall synthesis can provide useful information for decision makers. Where differences in intervention characteristics are more substantial (such as delivery of brief alcohol counselling by nurses versus doctors), and are expected to have a substantial impact on the size of intervention effects, these differences should be examined in the synthesis. What constitutes an important difference requires judgement, but in general differences that alter decisions about how an intervention is implemented or whether the intervention is used or not are likely to be important. In such circumstances, review authors should consider specifying separate groups (or subgroups) to examine in their synthesis.

Clearly defined intervention groups serve two main purposes in the synthesis. First, the way in which interventions are grouped for synthesis (meta-analysis or other synthesis) is likely to influence review findings. Careful planning of intervention groups makes best use of the available data, avoids decisions that are influenced by study findings (which may introduce bias), and produces a review focused on questions relevant to decision makers. Second, the intervention groups specified in a protocol provide a standardized terminology for describing the interventions throughout the review, overcoming the varied descriptions used by study authors (e.g. where different labels are used for the same intervention, or similar labels used for different techniques) (Michie et al 2013). This standardization enables comparison and synthesis of information about intervention characteristics across studies (common characteristics and differences) and provides a consistent language for reporting that supports interpretation of review findings.

Table 3.2.b   outlines a process for planning intervention groups as a basis for/precursor to synthesis, and the decision points and considerations at each step. The table is intended to guide, rather than to be prescriptive and, although it is presented as a sequence of steps, the process is likely to be iterative, and some steps may be done concurrently or in a different sequence. The process aims to minimize data-driven approaches that can arise once review authors have knowledge of the findings of the included studies. It also includes principles for developing a flexible plan that maximizes the potential to synthesize in circumstances where there are few studies, many variants of an intervention, or where the variants are difficult to anticipate. In all stages, review authors should consider how to categorize studies whose reports contain insufficient detail.

Table 3.2.b A process for planning intervention groups for synthesis

1. Identify intervention characteristics that may modify the effect of the intervention.

Consider whether differences in interventions characteristics might modify the size of the intervention effect importantly. Content-specific research literature and expertise should inform this step.

The TIDieR checklist – a tool for describing interventions – outlines the characteristics across which an intervention might differ (Hoffmann et al 2014). These include ‘what’ materials and procedures are used, ‘who’ provides the intervention, ‘when and how much’ intervention is delivered. The iCAT-SR tool provides equivalent guidance for complex interventions (Lewin et al 2017).

differ across multiple characteristics, which vary in importance depending on the review.

In a review of exercise for osteoporosis, whether the exercise is weight-bearing or non-weight-bearing may be a key characteristic, since the mechanism by which exercise is thought to work is by placing stress or mechanical load on bones (Howe et al 2011).

Different mechanisms apply in reviews of exercise for knee osteoarthritis (muscle strengthening), falls prevention (gait and balance), cognitive function (cardiovascular fitness).

The differing mechanisms might suggest different ways of grouping interventions (e.g. by intensity, mode of delivery) according to potential modifiers of the intervention effects.

2a. Label and define intervention groups to be considered in the synthesis.

 

For each intervention group, provide a short label (e.g. supportive psychotherapy) and describe the core characteristics (criteria) that will be used to assign each intervention from an included study to a group.

Groups are often defined by intervention content (especially the active components), such as materials, procedures or techniques (e.g. a specific drug, an information leaflet, a behaviour change technique). Other characteristics may also be used, although some are more commonly used to define subgroups (see ): the purpose or theoretical underpinning, mode of delivery, provider, dose or intensity, duration or timing of the intervention (Hoffmann et al 2014).

In specifying groups:

Logic models may help structure the synthesis (see and ).

In a review of psychological therapies for coronary heart disease, a single group was specified for meta-analysis that included all types of therapy. Subgroups were defined to examine whether intervention effects were modified by intervention components (e.g. cognitive techniques, stress management) or mode of delivery (e.g. individual, group) (Richards et al 2017).

In a review of psychological therapies for panic disorder (Pompoli et al 2016), eight types of therapy were specified:

1. psychoeducation;

2. supportive psychotherapy (with or without a psychoeducational component);

3. physiological therapies;

4. behaviour therapy;

5. cognitive therapy;

6. cognitive behaviour therapy (CBT);

7. third-wave CBT; and

8. psychodynamic therapies.

Groups were defined by the theoretical basis of each therapy (e.g. CBT aims to modify maladaptive thoughts through cognitive restructuring) and the component techniques used.

2b. Define levels for groups based on dose or intensity.

For groups based on ‘how much’ of an intervention is used (e.g. dose or intensity), criteria are needed to quantify each group. This may be straightforward for easy-to-quantify characteristics, but more complex for characteristics that are hard to quantify (e.g. duration or intensity of rehabilitation or psychological therapy).

The levels should be based on how the intervention is used in practice (e.g. cut-offs for low and high doses of a supplement based on recommended nutrient intake), or on a rationale for how the intervention might work.

In reviews of exercise, intensity may be defined by training time (session length, frequency, program duration), amount of work (e.g. repetitions), and effort/energy expenditure (exertion, heart rate) (Regnaux et al 2015).

In a review of organized inpatient care for stroke, acute stroke units were categorized as ‘intensive’, ‘semi-intensive’ or ‘non-intensive’ based on whether the unit had continuous monitoring, high nurse staffing, and life support facilities (Stroke Unit Trialists Collaboration 2013).

3. Determine whether there is an existing system for grouping interventions.

 

In some fields, intervention taxonomies and frameworks have been developed for labelling and describing interventions, and these can make it easier for those using a review to interpret and apply findings.

Using an agreed system is preferable to developing new groupings. Existing systems should be assessed for relevance and usefulness. The most useful systems:

Systems for grouping interventions may be generic, widely applicable across clinical areas, or specific to a condition or intervention type. Some Cochrane Groups recommend specific taxonomies.

The (BCT) (Michie et al 2013) categorizes intervention elements such as goal setting, self-monitoring and social support. A protocol for a review of social media interventions used this taxonomy to describe interventions and examine different BCTs as potential effect modifiers (Welch et al 2018).

The has been used to group interventions (or components) by function (e.g. to educate, persuade, enable) (Michie et al 2011). This system was used to describe the components of dietary advice interventions (Desroches et al 2013).

 

Multiple reviews have used the consensus-based taxonomy developed by the Prevention of Falls Network Europe (ProFaNE) (e.g. Verheyden et al 2013, Kendrick et al 2014). The taxonomy specifies broad groups (e.g. exercise, medication, environment/assistive technology) within which are more specific groups (e.g. exercise: gait, balance and functional training; flexibility; strength and resistance) (Lamb et al 2011).

4. Plan how the specified groups will be used in synthesis and reporting.

Decide whether it is useful to pool all interventions in a single meta-analysis (‘lumping’), within which specific characteristics can be explored as effect modifiers (e.g. in subgroups). Alternatively, if pooling all interventions is unlikely to address a useful question, separate synthesis of specific interventions may be more appropriate (‘splitting’).

Determining the right analytic approach is discussed further in .

In a review of exercise for knee osteoarthritis, the different categories of exercise were combined in a single meta-analysis, addressing the question ‘what is the effect of exercise on knee osteoarthritis?’. The categories were also analysed as subgroups within the meta-analysis to explore whether the effect size varied by type of exercise (Fransen et al 2015). Other subgroup analyses examined mode of delivery and dose.

5. Decide how to group interventions with multiple components or co-interventions.

Some interventions, especially those considered ‘complex’, include multiple components that could also be implemented independently (Guise et al 2014, Lewin et al 2017). These components might be eligible for inclusion in the review alone, or eligible only if used alongside an eligible intervention.

Options for considering multi-component interventions may include the following.

and Welton et al 2009, Caldwell and Welton 2016, Higgins et al 2019).

The first two approaches may be challenging but are likely to be most useful (Caldwell and Welton 2016).

See Section . for the special case of when a co-intervention is administered in both treatment arms.

In a review of psychological therapies for panic disorder, two of the eight eligible therapies (psychoeducation and supportive psychotherapy) could be used alone or as part of a multi-component therapy. When accompanied by another eligible therapy, the intervention was categorized as the other therapy (i.e. psychoeducation + cognitive behavioural therapy was categorized as cognitive behavioural therapy) (Pompoli et al 2016).

 

In a review of psychosocial interventions for smoking cessation in pregnancy, two approaches were used. All intervention types were included in a single meta-analysis with subgroups for multi-component, single and tailored interventions. Separate meta-analyses were also performed for each intervention type, with categorization of multi-component interventions based on the ‘main’ component (Chamberlain et al 2017).

6. Build in contingencies by specifying both specific and broader intervention groups.

Consider grouping interventions at more than one level, so that studies of a broader group of interventions can be synthesized if too few studies are identified for synthesis in more specific groups. This will provide flexibility where review authors anticipate few studies contributing to specific groups (e.g. in reviews with diverse interventions, additional diversity in other PICO elements, or few studies overall, see also ).

In a review of psychosocial interventions for smoking cessation, the authors planned to group any psychosocial intervention in a single comparison (addressing the higher level question of whether, on average, psychosocial interventions are effective). Given that sufficient data were available, they also presented separate meta-analyses to examine the effects of specific types of psychosocial interventions (e.g. counselling, health education, incentives, social support) (Chamberlain et al 2017).

3.2.3 Defining which comparisons will be made

When articulating the PICO for each synthesis, defining the intervention groups alone is not sufficient for complete specification of the planned syntheses. The next step is to define the comparisons that will be made between the intervention groups. Setting aside for a moment more complex analyses such as network meta-analyses, which can simultaneously compare many groups ( Chapter 11 ), standard meta-analysis ( Chapter 10 ) aims to draw conclusions about the comparative effects of two groups at a time (i.e. which of two intervention groups is more effective?). These comparisons form the basis for the syntheses that will be undertaken if data are available. Cochrane Reviews sometimes include one comparison, but most often include multiple comparisons. Three commonly identified types of comparisons include the following (Davey et al 2011).

  • newer generation antidepressants versus placebo (Hetrick et al 2012); and
  • vertebroplasty for osteoporotic vertebral compression fractures versus placebo (sham procedure) (Buchbinder et al 2018).
  • chemotherapy or targeted therapy plus best supportive care (BSC) versus BSC for palliative treatment of esophageal and gastroesophageal-junction carcinoma (Janmaat et al 2017); and
  • personalized care planning versus usual care for people with long-term conditions (Coulter et al 2015).
  • early (commenced at less than two weeks of age) versus late (two weeks of age or more) parenteral zinc supplementation in term and preterm infants (Taylor et al 2017);
  • high intensity versus low intensity physical activity or exercise in people with hip or knee osteoarthritis (Regnaux et al 2015);
  • multimedia education versus other education for consumers about prescribed and over the counter medications (Ciciriello et al 2013).

The first two types of comparisons aim to establish the effectiveness of an intervention, while the last aims to compare the effectiveness of two interventions. However, the distinction between the placebo and control is often arbitrary, since any differences in the care provided between trials with a control arm and those with a placebo arm may be unimportant , especially where ‘usual care’ is provided to both. Therefore, placebo and control groups may be determined to be similar enough to be combined for synthesis.

In reviews including multiple intervention groups, many comparisons are possible. In some of these reviews, authors seek to synthesize evidence on the comparative effectiveness of all their included interventions, including where there may be only indirect comparison of some interventions across the included studies ( Chapter 11, Section 11.2.1 ). However, in many reviews including multiple intervention groups, a limited subset of the possible comparisons will be selected. The chosen subset of comparisons should address the most important clinical and research questions. For example, if an established intervention (or dose of an intervention) is used in practice, then the synthesis would ideally compare novel or alternative interventions to this established intervention, and not, for example, to no intervention.

3.2.3.1 Dealing with co-interventions

Planning is needed for the special case where the same supplementary intervention is delivered to both the intervention and comparator groups. A supplementary intervention is an additional intervention delivered alongside the intervention of interest, such as massage in a review examining the effects of aromatherapy (i.e. aromatherapy plus massage versus massage alone). In many cases, the supplementary intervention will be unimportant and can be ignored. In other situations, the effect of the intervention of interest may differ according to whether participants receive the supplementary therapy. For example, the effect of aromatherapy among people who receive a massage may differ from the effect of the aromatherapy given alone. This will be the case if the intervention of interest interacts with the supplementary intervention leading to larger (synergistic) or smaller (dysynergistic/antagonistic) effects than the intervention of interest alone (Squires et al 2013). While qualitative interactions are rare (where the effect of the intervention is in the opposite direction when combined with the supplementary intervention), it is possible that there will be more variation in the intervention effects (heterogeneity) when supplementary interventions are involved, and it is important to plan for this. Approaches for dealing with this in the statistical synthesis may include fitting a random-effects meta-analysis model that encompasses heterogeneity ( Chapter 10, Section 10.10.4 ), or investigating whether the intervention effect is modified by the addition of the supplementary intervention through subgroup analysis ( Chapter 10, Section 10.11.2 ).

3.2.4 Selecting, prioritizing and grouping review outcomes

3.2.4.1 selecting review outcomes.

Broad outcome domains are decided at the time of setting up the review PICO (see Chapter 2 ). Once the broad domains are agreed, further specification is required to define the domains to facilitate reporting and synthesis (i.e. the PICO for comparison) (see Chapter 2, Section 2.3 ). The process for specifying and grouping outcomes largely parallels that used for specifying intervention groups.

Reporting of outcomes should rarely determine study eligibility for a review. In particular, studies should not be excluded because they do not report results of an outcome they may have measured, or provide ‘no usable data’ ( MECIR Box 3.2.d ). This is essential to avoid bias arising from selective reporting of findings by the study authors (see Chapter 13 ). However, in some circumstances, the measurement of certain outcomes may be a study eligibility criterion. This may be the case, for example, when the review addresses the potential for an intervention to prevent a particular outcome, or when the review addresses a specific purpose of an intervention that can be used in the same population for different purposes (such as hormone replacement therapy, or aspirin).

MECIR Box 3.2.d Relevant expectations for conduct of intervention reviews

Clarifying role of outcomes ( )

Outcome measures should not always form part of the criteria for including studies in a review. However, some reviews do legitimately restrict eligibility to specific outcomes. For example, the same intervention may be studied in the same population for different purposes (e.g. hormone replacement therapy, or aspirin); or a review may address specifically the adverse effects of an intervention used for several conditions. If authors do exclude studies on the basis of outcomes, care should be taken to ascertain that relevant outcomes are not available because they have not been measured rather than simply not reported.

Predefining outcome domains ( )

Full specification of the outcomes includes consideration of outcome domains (e.g. quality of life) and outcome measures (e.g. SF-36). Predefinition of outcome reduces the risk of selective outcome reporting. The should be as few as possible and should normally reflect at least one potential benefit and at least one potential area of harm. It is expected that the review should be able to synthesize these outcomes if eligible studies are identified, and that the conclusions of the review will be based largely on the effects of the interventions on these outcomes. Additional important outcomes may also be specified. Up to seven critical and important outcomes will form the basis of the GRADE assessment and summarized in the review’s abstract and other summary formats, although the review may measure more than seven outcomes.

Choosing outcomes ( )

Cochrane Reviews are intended to support clinical practice and policy, and should address outcomes that are critical or important to consumers. These should be specified at protocol stage. Where available, established sets of core outcomes should be used. Patient-reported outcomes should be included where possible. It is also important to judge whether evidence of resource use and costs might be an important component of decisions to adopt the intervention or alternative management strategies around the world. Large numbers of outcomes, while sometimes necessary, can make reviews unfocused, unmanageable for the user, and prone to selective outcome reporting bias. Biochemical, interim and process outcomes should be considered where they are important to decision makers. Any outcomes that would not be described as critical or important can be left out of the review.

Predefining outcome measures ( )

Having decided what outcomes are of interest to the review, authors should clarify acceptable ways in which these outcomes can be measured. It may be difficult, however, to predefine adverse effects.

C17: Predefining choices from multiple outcome measures ( )

Prespecification guards against selective outcome reporting, and allows users to confirm that choices were not overly influenced by the results. A predefined hierarchy of outcomes measures may be helpful. It may be difficult, however, to predefine adverse effects. A rationale should be provided for the choice of outcome measure

C18: Predefining time points of interest ( )

Prespecification guards against selective outcome reporting, and allows users to confirm that choices were not overly influenced by the results. Authors may consider whether all time frames or only selected time points will be included in the review. These decisions should be based on outcomes important for making healthcare decisions. One strategy to make use of the available data could be to group time points into prespecified intervals to represent ‘short-term’, ‘medium-term’ and ‘long-term’ outcomes and to take no more than one from each interval from each study for any particular outcome.

In general, systematic reviews should aim to include outcomes that are likely to be meaningful to the intended users and recipients of the reviewed evidence. This may include clinicians, patients (consumers), the general public, administrators and policy makers. Outcomes may include survival (mortality), clinical events (e.g. strokes or myocardial infarction), behavioural outcomes (e.g. changes in diet, use of services), patient-reported outcomes (e.g. symptoms, quality of life), adverse events, burdens (e.g. demands on caregivers, frequency of tests, restrictions on lifestyle) and economic outcomes (e.g. cost and resource use). It is critical that outcomes used to assess adverse effects as well as outcomes used to assess beneficial effects are among those addressed by a review (see Chapter 19 ).

Outcomes that are trivial or meaningless to decision makers should not be included in Cochrane Reviews. Inclusion of outcomes that are of little or no importance risks overwhelming and potentially misleading readers. Interim or surrogate outcomes measures, such as laboratory results or radiologic results (e.g. loss of bone mineral content as a surrogate for fractures in hormone replacement therapy), while potentially helpful in explaining effects or determining intervention integrity (see Chapter 5, Section 5.3.4.1 ), can also be misleading since they may not predict clinically important outcomes accurately. Many interventions reduce the risk for a surrogate outcome but have no effect or have harmful effects on clinically relevant outcomes, and some interventions have no effect on surrogate measures but improve clinical outcomes.

Various sources can be used to develop a list of relevant outcomes, including input from consumers and advisory groups (see Chapter 2 ), the clinical experiences of the review authors, and evidence from the literature (including qualitative research about outcomes important to those affected (see Chapter 21 )). A further driver of outcome selection is consideration of outcomes used in related reviews. Harmonization of outcomes across reviews addressing related questions facilitates broader evidence synthesis questions being addressed through the use of Overviews of reviews (see Chapter V ).

Outcomes considered to be meaningful, and therefore addressed in a review, may not have been reported in the primary studies. For example, quality of life is an important outcome, perhaps the most important outcome, for people considering whether or not to use chemotherapy for advanced cancer, even if the available studies are found to report only survival (see Chapter 18 ). A further example arises with timing of the outcome measurement, where time points determined as clinically meaningful in a review are not measured in the primary studies. Including and discussing all important outcomes in a review will highlight gaps in the primary research and encourage researchers to address these gaps in future studies.

3.2.4.2 Prioritizing review outcomes

Once a full list of relevant outcomes has been compiled for the review, authors should prioritize the outcomes and select the outcomes of most relevance to the review question. The GRADE approach to assessing the certainty of evidence (see Chapter 14 ) suggests that review authors separate outcomes into those that are ‘critical’, ‘important’ and ‘not important’ for decision making.

The critical outcomes are the essential outcomes for decision making, and are those that would form the basis of a ‘Summary of findings’ table or other summary versions of the review, such as the Abstract or Plain Language Summary. ‘Summary of findings’ tables provide key information about the amount of evidence for important comparisons and outcomes, the quality of the evidence and the magnitude of effect (see Chapter 14, Section 14.1 ). There should be no more than seven outcomes included in a ‘Summary of findings’ table, and those outcomes that will be included in summaries should be specified at the protocol stage. They should generally not include surrogate or interim outcomes. They should not be chosen on the basis of any anticipated or observed magnitude of effect, or because they are likely to have been addressed in the studies to be reviewed. Box 3.2.c summarizes the principal factors to consider when selecting and prioritizing review outcomes.

Box 3.2.c Factors to consider when selecting and prioritizing review outcomes

3.2.4.3 Defining and grouping outcomes for synthesis

Table 3.2.c outlines a process for planning for the diversity in outcome measurement that may be encountered in the studies included in a review and which can complicate, and sometimes prevent, synthesis. Research has repeatedly documented inconsistency in the outcomes measured across trials in the same clinical areas (Harrison et al 2016, Williamson et al 2017). This inconsistency occurs across all aspects of outcome measurement, including the broad domains considered, the outcomes measured, the way these outcomes are labelled and defined, and the methods and timing of measurement. For example, a review of outcome measures used in 563 studies of interventions for dementia and mild cognitive impairment found that 321 unique measurement methods were used for 1278 assessments of cognitive outcomes (Harrison et al 2016). Initiatives like COMET ( Core Outcome Measures in Effectiveness Trials ) aim to encourage standardization of outcome measurement across trials (Williamson et al 2017), but these initiatives are comparatively new and review authors will inevitably encounter diversity in outcomes across studies.

The process begins by describing the scope of each outcome domain in sufficient detail to enable outcomes from included studies to be categorized ( Table 3.2.c Step 1). This step may be straightforward in areas for which core outcome sets (or equivalent systems) exist ( Table 3.2.c Step 2). The methods and timing of outcome measurement also need to be specified, giving consideration to how differences across studies will be handled ( Table 3.2.c Steps 3 and 4). Subsequent steps consider options for dealing with studies that report multiple measures within an outcome domain ( Table 3.2.c Step 5), planning how outcome domains will be used in synthesis ( Table 3.2.c Step 6), and building in contingencies to maximize potential to synthesize ( Table 3.2.c Step 7).

Table 3.2.c A process for planning outcome groups for synthesis

1. Fully specify outcome domains.

For each outcome domain, provide a short label (e.g. cognition, consumer evaluation of care) and describe the domain in sufficient detail to enable eligible outcomes from each included study to be categorized. The definition should be based on the concept (or construct) measured, that is ‘what’ is measured. ‘When’ and ‘how’ the outcome is measured will be considered in subsequent steps.

Outcomes can be defined hierarchically, starting with very broad groups (e.g. physiological/clinical outcomes, life impact, adverse events), then outcome domains (e.g. functioning and perceived health status are domains within ‘life impact’). Within these may be narrower domains (e.g. physical function, cognitive function), and then specific outcome measures (Dodd et al 2018). The level at which outcomes are grouped for synthesis alters the question addressed, and so decisions should be guided by the review objectives.

In specifying outcome domains:

In a review of computer-based interventions for sexual health promotion, three broad outcome domains were defined (cognitions, behaviours, biological) based on a conceptual model of how the intervention might work. Each domain comprised more specific domains and outcomes (e.g. condom use, seeking health services such as STI testing); listing these helped define the broad domains and guided categorization of the diverse outcomes reported in included studies (Bailey et al 2010).

In a protocol for a review of social media interventions for improving health, the rationale for synthesizing broad groupings of outcomes (e.g. health behaviours, physical health) was based on prediction of a common underlying mechanism by which the intervention would work, and the review objective, which focused on overall health rather than specific outcomes (Welch et al 2018).

2. Determine whether there is an existing system for identifying and grouping important outcomes.

Systems for categorizing outcomes include core outcome sets including the and initiatives, and outcome taxonomies (Dodd et al 2018). These systems define agreed outcomes that should be measured for specific conditions (Williamson et al 2017).These systems can be used to standardize the varied outcome labels used across studies and enable grouping and comparison (Kirkham et al 2013). Agreed terminology may help decision makers interpret review findings.

The COMET website provides a database of core outcome sets agreed or in development. Some Cochrane Groups have developed their own outcome sets. While the availability of outcome sets and taxonomies varies across clinical areas, several taxonomies exist for specifying broad outcome domains (e.g. Dodd et al 2018, ICHOM 2018).

In a review of combined diet and exercise for preventing gestational diabetes mellitus, a core outcome set agreed by the Cochrane Pregnancy and Childbirth group was used (Shepherd et al 2017).

In a review of decision aids for people facing health treatment or screening decisions (Stacey et al 2017), outcome domains were based on criteria for evaluating decision aids agreed in the (IPDAS). Doing so helped to assess the use of aids across diverse clinical decisions.

The Cochrane Consumers and Communication Group has an agreed taxonomy to guide specification of outcomes of importance in evaluating communication interventions (Cochrane Consumers & Communication Group).

3. Define the outcome time points.

A key attribute of defining an outcome is specifying the time of measurement. In reviews, time frames, and not specific time points, are often specified to handle the likely diversity in timing of outcome measurement across studies (e.g. a ‘medium-term’ time frame might be defined as including outcomes measured between 6 and 12 months).

In specifying outcome timing:

In a review of psychological therapies for panic disorder, the main outcomes were ‘short-term’ (≤6 months from treatment commencement). ‘Long-term’ outcomes (>6 months from treatment commencement) were considered important, but not specified as critical because of concerns of participant attrition (Pompoli et al 2018).

In contrast, in a review of antidepressants, a clinically meaningful time frame of 6 to 12 months might be specified for the critical outcome ‘depression’, since this is the recommended treatment duration. However, it may be anticipated that many studies will be of shorter duration with short-term follow-up, so an additional important outcome of ‘depression (<3 months)’ might also be specified.

4. Specify the measurement tool or measurement method.

For each outcome domain, specify:

Minimum criteria for inclusion of a measure may include:

(e.g. consistent scores across time and raters when the outcome is unchanged), and (e.g. comparable results to similar measures, including a gold standard if available); and

Measures may be identified from core outcome sets (e.g. Williamson et al 2017, ICHOM 2018) or systematic reviews of instruments (see COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative for a database of examples).

In a review of interventions to support women to stop smoking, objective (biochemically validated) and subjective (self-report) measures of smoking cessation were specified separately to examine bias due to the method used to measure the outcome (Step 6) (Chamberlain et al 2017).

In a review of high-intensity versus low-intensity exercise for osteoarthritis, measures of pain were selected based on relevance of the content and properties of the measurement tool (i.e. evidence of validity and reliability) (Regnaux et al 2015).

5. Specify how multiplicity of outcomes will be handled.

For a particular domain, multiple outcomes within a study may be available for inclusion. This may arise from:

Effects of the intervention calculated from these different sources of multiplicity are statistically dependent, since they have been calculated using the same participants. To deal with this dependency, select only one outcome per study for a particular comparison, or use a meta-analysis method that accounts for the dependency (see Step 6).

Pre-specify the method of selection from multiple outcomes or measures in the protocol, using an approach that is independent of the result (see ) (López-López et al 2018). Document all eligible outcomes or measures in the ‘Characteristics of included studies’ table, noting which was selected and why.

Multiplicity can arise from the reporting of multiple analyses of the same outcome (e.g. analyses that do and do not adjust for prognostic factors; intention-to-treat and per-protocol analyses) and multiple reports of the same study (e.g. journal articles, conference abstracts). Approaches for dealing with this type of multiplicity should also be specified in the protocol (López-López et al 2018).

It may be difficult to anticipate all forms of multiplicity when developing a protocol. Any post-hoc approaches used to select outcomes or results should be noted at the beginning of the Methods section, or if extensive, within an additional supplementary material.

The following hierarchy was specified to select one outcome per domain in a review examining the effects of portion, package or tableware size (Hollands et al 2015):

Selection of the outcome was made blinded to the results. All available outcome measures were documented in the ‘Characteristics of included studies’ table.

In a review of audit and feedback for healthcare providers, the outcome domains were ‘provider performance’ (e.g. compliance with recommended use of a laboratory test) and ‘patient health outcomes’ (e.g. smoking status, blood pressure) (Ivers et al 2012). For each domain, outcomes were selected using the following hierarchy:

6. Plan how the specified outcome domains will be used in the synthesis.

When different measurement methods or tools have been used across studies, consideration must be given to how these will be synthesized. Options include the following.

and ). There may be increased heterogeneity, warranting use of a random-effects model ( ).

In a review of interventions to support women to stop smoking, separate outcome domains were specified for biochemically validated measures of smoking and self-report measures. The two domains were meta-analysed together, but sensitivity analyses were undertaken restricting the meta-analyses to studies with only biochemically validated outcomes, to examine if the results were robust to the method of measurement (Chamberlain et al 2017).

In a review of psychological therapies for youth internalizing and externalizing disorders, most studies contributed multiple effects (e.g. in one meta-analysis of 443 studies, there were 5139 included measures). The authors used multilevel modelling to address the dependency among multiple effects contributed from each study (Weisz et al 2017).

7. Where possible, build in contingencies by specifying both specific and broader outcome domains.

Consider building in flexibility to group outcomes at different levels or time intervals. Inflexible approaches can undermine the potential to synthesize, especially when few studies are anticipated, or there is likely to be diversity in the way outcomes are defined and measured and the timing of measurement. If insufficient studies report data for meaningful synthesis using the narrower domains, the broader domains can be used (see also ).

Consider a hypothetical review aiming to examine the effects of behavioural psychological interventions for the treatment of overweight and obese adults. A specific outcome is body mass index (BMI). However, also specifying a broader outcome domain ‘indicator of body mass’ will facilitate synthesis in the circumstance where few studies report BMI, but most report an indicator of body mass (such as weight or waist circumference). This is particularly important when few studies may be anticipated or there is expected diversity in the measurement methods or tools.

3.3 Determining which study designs to include

Some study designs are more appropriate than others for answering particular questions. Authors need to consider a priori what study designs are likely to provide reliable data with which to address the objectives of their review ( MECIR Box 3.3.a ). Sections 3.3.1 and 3.3.2 cover randomized and non-randomized designs for assessing treatment effects; Chapter 17, Section 17.2.5  discusses other study designs in the context of addressing intervention complexity.

MECIR Box 3.3.a Relevant expectations for conduct of intervention reviews

Predefining study designs ( )

Predefined, unambiguous eligibility criteria are a fundamental prerequisite for a systematic review. This is particularly important when non-randomized studies are considered. Some labels commonly used to define study designs can be ambiguous. For example a ‘double blind’ study may not make it clear who was blinded; a ‘case-control’ study may be nested within a cohort, or be undertaken in a cross-sectional manner; or a ‘prospective’ study may have only some features defined or undertaken prospectively.

Justifying choice of study designs ( )

It might be difficult to address some interventions or some outcomes in randomized trials. Authors should be able to justify why they have chosen either to restrict the review to randomized trials or to include non-randomized studies. The particular study designs included should be justified with regard to appropriateness to the review question and with regard to potential for bias.

3.3.1 Including randomized trials

Because Cochrane Reviews address questions about the effects of health care, they focus primarily on randomized trials and randomized trials should be included if they are feasible for the interventions of interest ( MECIR Box 3.3.b ). Randomization is the only way to prevent systematic differences between baseline characteristics of participants in different intervention groups in terms of both known and unknown (or unmeasured) confounders (see Chapter 8 ), and claims about cause and effect can be based on their findings with far more confidence than almost any other type of study. For clinical interventions, deciding who receives an intervention and who does not is influenced by many factors, including prognostic factors. Empirical evidence suggests that, on average, non-randomized studies produce effect estimates that indicate more extreme benefits of the effects of health care than randomized trials. However, the extent, and even the direction, of the bias is difficult to predict. These issues are discussed at length in Chapter 24 , which provides guidance on when it might be appropriate to include non-randomized studies in a Cochrane Review.

Practical considerations also motivate the restriction of many Cochrane Reviews to randomized trials. In recent decades there has been considerable investment internationally in establishing infrastructure to index and identify randomized trials. Cochrane has contributed to these efforts, including building up and maintaining a database of randomized trials, developing search filters to aid their identification, working with MEDLINE to improve tagging and identification of randomized trials, and using machine learning and crowdsourcing to reduce author workload in identifying randomized trials ( Chapter 4, Section 4.6.6.2 ). The same scale of organizational investment has not (yet) been matched for the identification of other types of studies. Consequently, identifying and including other types of studies may require additional efforts to identify studies and to keep the review up to date, and might increase the risk that the result of the review will be influenced by publication bias. This issue and other bias-related issues that are important to consider when defining types of studies are discussed in detail in Chapter 7 and Chapter 13 .

Specific aspects of study design and conduct should be considered when defining eligibility criteria, even if the review is restricted to randomized trials. For example, whether cluster-randomized trials ( Chapter 23, Section 23.1 ) and crossover trials ( Chapter 23, Section 23.2 ) are eligible, as well as other criteria for eligibility such as use of a placebo comparison group, evaluation of outcomes blinded to allocation sequence, or a minimum period of follow-up. There will always be a trade-off between restrictive study design criteria (which might result in the inclusion of studies that are at low risk of bias, but very few in number) and more liberal design criteria (which might result in the inclusion of more studies, but at a higher risk of bias). Furthermore, excessively broad criteria might result in the inclusion of misleading evidence. If, for example, interest focuses on whether a therapy improves survival in patients with a chronic condition, it might be inappropriate to look at studies of very short duration, except to make explicit the point that they cannot address the question of interest.

MECIR Box 3.3.b Relevant expectations for conduct of intervention reviews

Including randomized trials ( )

if it is feasible to conduct them to evaluate the interventions and outcomes of interest.

Randomized trials are the best study design for evaluating the efficacy of interventions. If it is feasible to conduct them to evaluate questions that are being addressed by the review, they must be considered eligible for the review. However, appropriate exclusion criteria may be put in place, for example regarding length of follow-up.

3.3.2 Including non-randomized studies

The decision of whether non-randomized studies (and what type) will be included is decided alongside the formulation of the review PICO. The main drivers that may lead to the inclusion of non-randomized studies include: (i) when randomized trials are unable to address the effects of the intervention on harm and long-term outcomes or in specific populations or settings; or (ii) for interventions that cannot be randomized (e.g. policy change introduced in a single or small number of jurisdictions) (see Chapter 24 ). Cochrane, in collaboration with others, has developed guidance for review authors to support their decision about when to look for and include non-randomized studies (Schünemann et al 2013).

Non-randomized designs have the commonality of not using randomization to allocate units to comparison groups, but their different design features mean that they are variable in their susceptibility to bias. Eligibility criteria should be based on explicit study design features, and not the study labels applied by the primary researchers (e.g. case-control, cohort), which are often used inconsistently (Reeves et al 2017; see Chapter 24 ).

When non-randomized studies are included, review authors should consider how the studies will be grouped and used in the synthesis. The Cochrane Non-randomized Studies Methods Group taxonomy of design features (see Chapter 24 ) may provide a basis for grouping together studies that are expected to have similar inferential strength and for providing a consistent language for describing the study design.

Once decisions have been made about grouping study designs, planning of how these will be used in the synthesis is required. Review authors need to decide whether it is useful to synthesize results from non-randomized studies and, if so, whether results from randomized trials and non-randomized studies should be included in the same synthesis (for the purpose of examining whether study design explains heterogeneity among the intervention effects), or whether the effects should be synthesized in separate comparisons (Valentine and Thompson 2013). Decisions should be made for each of the different types of non-randomized studies under consideration. Review authors might anticipate increased heterogeneity when non-randomized studies are synthesized, and adoption of a meta-analysis model that encompasses heterogeneity is wise (Valentine and Thompson 2013) (such as a random effects model, see Chapter 10, Section 10.10.4 ). For further discussion of non-randomized studies, see Chapter 24 .

3.4 Eligibility based on publication status and language

Chapter 4 contains detailed guidance on how to identify studies from a range of sources including, but not limited to, those in peer-reviewed journals. In general, a strategy to include studies reported in all types of publication will reduce bias ( Chapter 7 ). There would need to be a compelling argument for the exclusion of studies on the basis of their publication status ( MECIR Box 3.4.a ), including unpublished studies, partially published studies, and studies published in ‘grey’ literature sources. Given the additional challenge in obtaining unpublished studies, it is possible that any unpublished studies identified in a given review may be an unrepresentative subset of all the unpublished studies in existence. However, the bias this introduces is of less concern than the bias introduced by excluding all unpublished studies, given what is known about the impact of reporting biases (see Chapter 13 on bias due to missing studies, and Chapter 4, Section 4.3 for a more detailed discussion of searching for unpublished and grey literature).

Likewise, while searching for, and analysing, studies in any language can be extremely resource-intensive, review authors should consider carefully the implications for bias (and equity, see Chapter 16 ) if they restrict eligible studies to those published in one specific language (usually English). See Chapter 4, Section 4.4.5 , for further discussion of language and other restrictions while searching.

MECIR Box 3.4.a Relevant expectations for conduct of intervention reviews

Excluding studies based on publication status ( )

Obtaining and including data from unpublished studies (including grey literature) can reduce the effects of publication bias. However, the unpublished studies that can be located may be an unrepresentative sample of all unpublished studies.

3.5 Chapter information

Authors: Joanne E McKenzie, Sue E Brennan, Rebecca E Ryan, Hilary J Thomson, Renea V Johnston, James Thomas

Acknowledgements: This chapter builds on earlier versions of the Handbook . In particular, Version 5, Chapter 5 , edited by Denise O’Connor, Sally Green and Julian Higgins.

Funding: JEM is supported by an Australian National Health and Medical Research Council (NHMRC) Career Development Fellowship (1143429). SEB and RER’s positions are supported by the NHMRC Cochrane Collaboration Funding Program. HJT is funded by the UK Medical Research Council (MC_UU_12017-13 and MC_UU_12017-15) and Scottish Government Chief Scientist Office (SPHSU13 and SPHSU15). RVJ’s position is supported by the NHMRC Cochrane Collaboration Funding Program and Cabrini Institute. JT is supported by the National Institute for Health Research (NIHR) Collaboration for Leadership in Applied Health Research and Care North Thames at Barts Health NHS Trust. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

3.6 References

Bailey JV, Murray E, Rait G, Mercer CH, Morris RW, Peacock R, Cassell J, Nazareth I. Interactive computer-based interventions for sexual health promotion. Cochrane Database of Systematic Reviews 2010; 9 : CD006483.

Bender R, Bunce C, Clarke M, Gates S, Lange S, Pace NL, Thorlund K. Attention should be given to multiplicity issues in systematic reviews. Journal of Clinical Epidemiology 2008; 61 : 857–865.

Buchbinder R, Johnston RV, Rischin KJ, Homik J, Jones CA, Golmohammadi K, Kallmes DF. Percutaneous vertebroplasty for osteoporotic vertebral compression fracture. Cochrane Database of Systematic Reviews 2018; 4 : CD006349.

Caldwell DM, Welton NJ. Approaches for synthesising complex mental health interventions in meta-analysis. Evidence-Based Mental Health 2016; 19 : 16–21.

Chamberlain C, O’Mara-Eves A, Porter J, Coleman T, Perlen S, Thomas J, McKenzie J. Psychosocial interventions for supporting women to stop smoking in pregnancy. Cochrane Database of Systematic Reviews 2017; 2 : CD001055.

Ciciriello S, Johnston RV, Osborne RH, Wicks I, deKroo T, Clerehan R, O’Neill C, Buchbinder R. Multimedia educational interventions for consumers about prescribed and over-the-counter medications. Cochrane Database of Systematic Reviews 2013; 4 : CD008416.

Cochrane Consumers & Communication Group. Outcomes of Interest to the Cochrane Consumers & Communication Group: taxonomy. http://cccrg.cochrane.org/ .

COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) initiative. COSMIN database of systematic reviews of outcome measurement instruments. https://database.cosmin.nl/ .

Coulter A, Entwistle VA, Eccles A, Ryan S, Shepperd S, Perera R. Personalised care planning for adults with chronic or long-term health conditions. Cochrane Database of Systematic Reviews 2015; 3 : CD010523.

Davey J, Turner RM, Clarke MJ, Higgins JPT. Characteristics of meta-analyses and their component studies in the Cochrane Database of Systematic Reviews: a cross-sectional, descriptive analysis. BMC Medical Research Methodology 2011; 11 : 160.

Desroches S, Lapointe A, Ratte S, Gravel K, Legare F, Turcotte S. Interventions to enhance adherence to dietary advice for preventing and managing chronic diseases in adults. Cochrane Database of Systematic Reviews 2013; 2 : CD008722.

Deyo RA, Dworkin SF, Amtmann D, Andersson G, Borenstein D, Carragee E, Carrino J, Chou R, Cook K, DeLitto A, Goertz C, Khalsa P, Loeser J, Mackey S, Panagis J, Rainville J, Tosteson T, Turk D, Von Korff M, Weiner DK. Report of the NIH Task Force on research standards for chronic low back pain. Journal of Pain 2014; 15 : 569–585.

Dodd S, Clarke M, Becker L, Mavergames C, Fish R, Williamson PR. A taxonomy has been developed for outcomes in medical research to help improve knowledge discovery. Journal of Clinical Epidemiology 2018; 96 : 84–92.

Fisher DJ, Carpenter JR, Morris TP, Freeman SC, Tierney JF. Meta-analytical methods to identify who benefits most from treatments: daft, deluded, or deft approach? BMJ 2017; 356 : j573.

Fransen M, McConnell S, Harmer AR, Van der Esch M, Simic M, Bennell KL. Exercise for osteoarthritis of the knee. Cochrane Database of Systematic Reviews 2015; 1 : CD004376.

Guise JM, Chang C, Viswanathan M, Glick S, Treadwell J, Umscheid CA. Systematic reviews of complex multicomponent health care interventions. Report No. 14-EHC003-EF . Rockville, MD: Agency for Healthcare Research and Quality; 2014.

Harrison JK, Noel-Storr AH, Demeyere N, Reynish EL, Quinn TJ. Outcomes measures in a decade of dementia and mild cognitive impairment trials. Alzheimer’s Research and Therapy 2016; 8 : 48.

Hedges LV, Tipton E, Johnson M, C. Robust variance estimation in meta-regression with dependent effect size estimates. Research Synthesis Methods 2010; 1 : 39–65.

Hetrick SE, McKenzie JE, Cox GR, Simmons MB, Merry SN. Newer generation antidepressants for depressive disorders in children and adolescents. Cochrane Database of Systematic Reviews 2012; 11 : CD004851.

Higgins JPT, López-López JA, Becker BJ, Davies SR, Dawson S, Grimshaw JM, McGuinness LA, Moore THM, Rehfuess E, Thomas J, Caldwell DM. Synthesizing quantitative evidence in systematic reviews of complex health interventions. BMJ Global Health 2019; 4 : e000858.

Hoffmann T, Glasziou P, Barbour V, Macdonald H. Better reporting of interventions: template for intervention description and replication (TIDieR) checklist and guide. BMJ 2014; 1687 : 1-13.

Hollands GJ, Shemilt I, Marteau TM, Jebb SA, Lewis HB, Wei Y, Higgins JPT, Ogilvie D. Portion, package or tableware size for changing selection and consumption of food, alcohol and tobacco. Cochrane Database of Systematic Reviews 2015; 9 : CD011045.

Howe TE, Shea B, Dawson LJ, Downie F, Murray A, Ross C, Harbour RT, Caldwell LM, Creed G. Exercise for preventing and treating osteoporosis in postmenopausal women. Cochrane Database of Systematic Reviews 2011; 7 : CD000333.

ICHOM. The International Consortium for Health Outcomes Measurement 2018. http://www.ichom.org/ .

IPDAS. International Patient Decision Aid Standards Collaboration (IPDAS) standards. www.ipdas.ohri.ca .

Ivers N, Jamtvedt G, Flottorp S, Young JM, Odgaard-Jensen J, French SD, O’Brien MA, Johansen M, Grimshaw J, Oxman AD. Audit and feedback: effects on professional practice and healthcare outcomes. Cochrane Database of Systematic Reviews 2012; 6 : CD000259.

Janmaat VT, Steyerberg EW, van der Gaast A, Mathijssen RH, Bruno MJ, Peppelenbosch MP, Kuipers EJ, Spaander MC. Palliative chemotherapy and targeted therapies for esophageal and gastroesophageal junction cancer. Cochrane Database of Systematic Reviews 2017; 11 : CD004063.

Kendrick D, Kumar A, Carpenter H, Zijlstra GAR, Skelton DA, Cook JR, Stevens Z, Belcher CM, Haworth D, Gawler SJ, Gage H, Masud T, Bowling A, Pearl M, Morris RW, Iliffe S, Delbaere K. Exercise for reducing fear of falling in older people living in the community. Cochrane Database of Systematic Reviews 2014; 11 : CD009848.

Kirkham JJ, Gargon E, Clarke M, Williamson PR. Can a core outcome set improve the quality of systematic reviews? A survey of the Co-ordinating Editors of Cochrane Review Groups. Trials 2013; 14 : 21.

Konstantopoulos S. Fixed effects and variance components estimation in three-level meta-analysis. Research Synthesis Methods 2011; 2 : 61–76.

Lamb SE, Becker C, Gillespie LD, Smith JL, Finnegan S, Potter R, Pfeiffer K. Reporting of complex interventions in clinical trials: development of a taxonomy to classify and describe fall-prevention interventions. Trials 2011; 12 : 125.

Lewin S, Hendry M, Chandler J, Oxman AD, Michie S, Shepperd S, Reeves BC, Tugwell P, Hannes K, Rehfuess EA, Welch V, Mckenzie JE, Burford B, Petkovic J, Anderson LM, Harris J, Noyes J. Assessing the complexity of interventions within systematic reviews: development, content and use of a new tool (iCAT_SR). BMC Medical Research Methodology 2017; 17 : 76.

López-López JA, Page MJ, Lipsey MW, Higgins JPT. Dealing with multiplicity of effect sizes in systematic reviews and meta-analyses. Research Synthesis Methods 2018; 9 : 336–351.

Mavridis D, Salanti G. A practical introduction to multivariate meta-analysis. Statistical Methods in Medical Research 2013; 22 : 133–158.

Michie S, van Stralen M, West R. The Behaviour Change Wheel: a new method for characterising and designing behaviour change interventions. Implementation Science 2011; 6 : 42.

Michie S, Richardson M, Johnston M, Abraham C, Francis J, Hardeman W, Eccles MP, Cane J, Wood CE. The behavior change technique taxonomy (v1) of 93 hierarchically clustered techniques: building an international consensus for the reporting of behavior change interventions. Annals of Behavioral Medicine 2013; 46 : 81–95.

Moraes VY, Lenza M, Tamaoki MJ, Faloppa F, Belloti JC. Platelet-rich therapies for musculoskeletal soft tissue injuries. Cochrane Database of Systematic Reviews 2014; 4 : CD010071.

O'Neill J, Tabish H, Welch V, Petticrew M, Pottie K, Clarke M, Evans T, Pardo Pardo J, Waters E, White H, Tugwell P. Applying an equity lens to interventions: using PROGRESS ensures consideration of socially stratifying factors to illuminate inequities in health. Journal of Clinical Epidemiology 2014; 67 : 56–64.

Pompoli A, Furukawa TA, Imai H, Tajika A, Efthimiou O, Salanti G. Psychological therapies for panic disorder with or without agoraphobia in adults: a network meta-analysis. Cochrane Database of Systematic Reviews 2016; 4 : CD011004.

Pompoli A, Furukawa TA, Efthimiou O, Imai H, Tajika A, Salanti G. Dismantling cognitive-behaviour therapy for panic disorder: a systematic review and component network meta-analysis. Psychological Medicine 2018; 48 : 1–9.

Reeves BC, Wells GA, Waddington H. Quasi-experimental study designs series-paper 5: a checklist for classifying studies evaluating the effects on health interventions – a taxonomy without labels. Journal of Clinical Epidemiology 2017; 89 : 30–42.

Regnaux J-P, Lefevre-Colau M-M, Trinquart L, Nguyen C, Boutron I, Brosseau L, Ravaud P. High-intensity versus low-intensity physical activity or exercise in people with hip or knee osteoarthritis. Cochrane Database of Systematic Reviews 2015; 10 : CD010203.

Richards SH, Anderson L, Jenkinson CE, Whalley B, Rees K, Davies P, Bennett P, Liu Z, West R, Thompson DR, Taylor RS. Psychological interventions for coronary heart disease. Cochrane Database of Systematic Reviews 2017; 4 : CD002902.

Safi S, Korang SK, Nielsen EE, Sethi NJ, Feinberg J, Gluud C, Jakobsen JC. Beta-blockers for heart failure. Cochrane Database of Systematic Reviews 2017; 12 : CD012897.

Santesso N, Carrasco-Labra A, Brignardello-Petersen R. Hip protectors for preventing hip fractures in older people. Cochrane Database of Systematic Reviews 2014; 3 : CD001255.

Shepherd E, Gomersall JC, Tieu J, Han S, Crowther CA, Middleton P. Combined diet and exercise interventions for preventing gestational diabetes mellitus. Cochrane Database of Systematic Reviews 2017; 11 : CD010443.

Squires J, Valentine J, Grimshaw J. Systematic reviews of complex interventions: framing the review question. Journal of Clinical Epidemiology 2013; 66 : 1215–1222.

Stacey D, Légaré F, Lewis K, Barry MJ, Bennett CL, Eden KB, Holmes-Rovner M, Llewellyn-Thomas H, Lyddiatt A, Thomson R, Trevena L. Decision aids for people facing health treatment or screening decisions. Cochrane Database of Systematic Reviews 2017; 4 : CD001431.

Stroke Unit Trialists Collaboration. Organised inpatient (stroke unit) care for stroke. Cochrane Database of Systematic Reviews 2013; 9 : CD000197.

Taylor AJ, Jones LJ, Osborn DA. Zinc supplementation of parenteral nutrition in newborn infants. Cochrane Database of Systematic Reviews 2017; 2 : CD012561.

Valentine JC, Thompson SG. Issues relating to confounding and meta-analysis when including non-randomized studies in systematic reviews on the effects of interventions. Research Synthesis Methods 2013; 4 : 26–35.

Vaona A, Banzi R, Kwag KH, Rigon G, Cereda D, Pecoraro V, Tramacere I, Moja L. E-learning for health professionals. Cochrane Database of Systematic Reviews 2018; 1 : CD011736.

Verheyden GSAF, Weerdesteyn V, Pickering RM, Kunkel D, Lennon S, Geurts ACH, Ashburn A. Interventions for preventing falls in people after stroke. Cochrane Database of Systematic Reviews 2013; 5 : CD008728.

Weisz JR, Kuppens S, Ng MY, Eckshtain D, Ugueto AM, Vaughn-Coaxum R, Jensen-Doss A, Hawley KM, Krumholz Marchette LS, Chu BC, Weersing VR, Fordwood SR. What five decades of research tells us about the effects of youth psychological therapy: a multilevel meta-analysis and implications for science and practice. American Psychologist 2017; 72 : 79–117.

Welch V, Petkovic J, Simeon R, Presseau J, Gagnon D, Hossain A, Pardo Pardo J, Pottie K, Rader T, Sokolovski A, Yoganathan M, Tugwell P, DesMeules M. Interactive social media interventions for health behaviour change, health outcomes, and health equity in the adult population. Cochrane Database of Systematic Reviews 2018; 2 : CD012932.

Welton NJ, Caldwell DM, Adamopoulos E, Vedhara K. Mixed treatment comparison meta-analysis of complex interventions: psychological interventions in coronary heart disease. American Journal of Epidemiology 2009; 169 : 1158–1165.

Williamson PR, Altman DG, Bagley H, Barnes KL, Blazeby JM, Brookes ST, Clarke M, Gargon E, Gorst S, Harman N, Kirkham JJ, McNair A, Prinsen CAC, Schmitt J, Terwee CB, Young B. The COMET Handbook: version 1.0. Trials 2017; 18 : 280.

For permission to re-use material from the Handbook (either academic or commercial), please see here for full details.

COMMENTS

  1. Cochrane Reviews

    See more on using PICO in the Cochrane Handbook. The Cochrane Library is a collection of high-quality, independent evidence to inform healthcare decision-making, including the Cochrane Database of Systematic Reviews and the CENTRAL register of controlled trials.

  2. Chapter 1: Starting a review

    Cochrane Handbook for Systematic Reviews of Interventions version 6.5. Cochrane, 2024. ... With the volume of research literature growing at an ever-increasing rate, it is impossible for individual decision makers to assess this vast quantity of primary research to enable them to make the most appropriate healthcare decisions that do more good ...

  3. Cochrane Database of Systematic Reviews

    The Cochrane Database of Systematic Reviews (CDSR) is the leading database for systematic reviews in health care. The CDSR includes Cochrane Reviews (systematic reviews) and protocols for Cochrane Reviews as well as editorials and supplements. The CDSR (ISSN 1469-493X) is owned and produced by Cochrane, a global, independent network of ...

  4. PDF Writing your Protocol for a Cochrane Review

    • The Cochrane Handbook can be found here (the new edition was released October 2019). It too offers much help and guidance for writing your protocol and conducting your review. Sections of interest may include YChapter II: Planning a Cochrane Review Z, YChapter II.1.4 Cochrane protocols Z, YChapter III.2 Reporting of protocols of new Cochrane

  5. Cochrane Handbook for Systematic Reviews of Interventions

    Cochrane Handbook for Systematic Reviews of Interventions. Version 6.5, 2024. Senior Editors: Julian Higgins 1, James Thomas 2. Associate Editors: Jacqueline Chandler 3, Miranda Cumpston 4,5, Tianjing Li 6, Matthew Page 4, Vivian Welch 7. Part 1: About Cochrane Reviews.

  6. About Cochrane Reviews

    Many Cochrane Reviews measure benefits and harms by collecting data from more than one trial, and combining them to generate an average result. This aims to provide a more precise estimate of the effects of an intervention and to reduce uncertainty. Not every review in the Cochrane Database of Systematic Reviews contains a meta-analysis.

  7. Cochrane Handbook for Systematic Reviews of Interventions

    The Cochrane Handbook for Systematic Reviews of Interventions is the official guide that describes in detail the process of preparing and maintaining Cochrane systematic reviews on the effects of healthcare interventions. All authors should consult the Handbook for guidance on the methods used in Cochrane systematic reviews.

  8. What are systematic reviews?

    Cochrane evidence, including our systematic reviews, provides a powerful tool to enhance your healthcare knowledge and decision making. This video from Cochrane Sweden explains a bit about how we create health evidence and what Cochrane does. Search our Plain Language Summaries of health evidence. Learn more about Cochrane and our work.

  9. 1.2.2 What is a systematic review?

    A systematic review attempts to collate all empirical evidence that fits pre-specified eligibility criteria in order to answer a specific research question. It uses explicit, systematic methods that are selected with a view to minimizing bias, thus providing more reliable findings from which conclusions can be drawn and decisions made (Antman ...

  10. Cochrane and systematic reviews

    The Cochrane Library is an electronic collection of databases published on the internet and also available on CD-Rom. It is updated quarterly in an effort to add to and keep the information current. The Library is made up of a number of parts. The Cochrane Database of Systematic Reviews (CDSR) contains the published Cochrane reviews and protocols.

  11. Living systematic reviews

    Living systematic reviews (LSRs) provide a new approach to support the ongoing efforts of Cochrane and others to produce evidence that is both trustworthy and current. The concept of living evidence synthesis and related outputs, such as living guidelines, are of increasing interest to evidence producers, decision makers, guideline developers ...

  12. New Cochrane Handbook for Systematic Reviews of Interventions

    The new edition of the Handbook is divided into four parts. The first section (available only online) addresses issues specific to working with Cochrane. The second describes the core methods applicable to systematic reviews of interventions, from framing the question through to interpreting the results. The third and fourth parts address ...

  13. Module 1: Introduction to conducting systematic reviews

    This module will teach you to: Recognize features of systematic reviews as a research design. Recognize the importance of using rigorous methods to conduct a systematic review. Identify the types of review questions. Identify the elements of a well-defined review question. Understand the steps in a systematic review.

  14. Systematic Reviews

    Cochrane systematic reviews provide reliable, evidence-based information on health issues. A systematic review is the result of a rigorous scientific process consisting of several well-defined steps, including a systematic literature search, an evaluation of the quality of each included study and a synthesis, quantified or narrative, of the ...

  15. An overview of methodological approaches in systematic reviews

    1. INTRODUCTION. Evidence synthesis is a prerequisite for knowledge translation. 1 A well conducted systematic review (SR), often in conjunction with meta‐analyses (MA) when appropriate, is considered the "gold standard" of methods for synthesizing evidence related to a topic of interest. 2 The central strength of an SR is the transparency of the methods used to systematically search ...

  16. Guidelines for writing a systematic review

    A Systematic Review (SR) is a synthesis of evidence that is identified and critically appraised to understand a specific topic. SRs are more comprehensive than a Literature Review, which most academics will be familiar with, as they follow a methodical process to identify and analyse existing literature (Cochrane, 2022).This ensures that relevant studies are included within the synthesis and ...

  17. Traditional reviews vs. systematic reviews

    Systematic literature review X narrative review. Acta paul. enferm. [Internet]. 2007 June [cited 2015 Dec 25]; 20(2): v-vi. Available from: ... "Cochrane Reviews should be undertaken by more than one person. In putting together a team, authors should consider the need for clinical and methodological expertise for the review, as well as the ...

  18. VIDEO: What are systematic reviews?

    VIDEO: What are systematic reviews? A systematic review attempts to identify, appraise and synthesize all the empirical evidence that meets pre-specified eligibility criteria to answer a specific research question. Researchers conducting systematic reviews use explicit, systematic methods that are selected with a view aimed at minimizing bias ...

  19. Systematic Reviews and Other Evidence Synthesis Types Guide

    Systematic Review - seeks to systematically search for, appraise and synthesize research evidence on a specific question, often adhering to guidelines on the conduct of a review.. Meta-analysis - a technique that statistically combines the results of quantitative studies to provide a more precise effect of the results. A good systematic review is essential to a meta-analysis of the literature.

  20. Chapter 2: Determining the scope of the review and the ...

    2.2 Aims of reviews of interventions. Systematic reviews can address any question that can be answered by a primary research study. This Handbook focuses on a subset of all possible review questions: the impact of intervention(s) implemented within a specified human population. Even within these limits, systematic reviews examining the effects of intervention(s) can vary quite markedly in ...

  21. Guidance to best tools and practices for systematic reviews

    Several empirical evaluations have shown that Cochrane systematic reviews are of higher methodological quality compared with non-Cochrane reviews [4, 7, 9, 11, 14, 32-35]. However, some of these assessments have biases: they may be conducted by Cochrane-affiliated authors, and they sometimes use scales and tools developed and used in the ...

  22. Welcome

    Welcome to the Cochrane Rapid Reviews Methods Group (RRMG) website. The RRMG is one of 17 Cochrane Method Groups world-wide comprised of individuals with an interest and expertise in the science of systematic reviews. While the concept of rapid evidence synthesis, or rapid review (RR), is not novel, it remains a poorly understood and as yet ill ...

  23. Chapter 14: Completing 'Summary of findings' tables and ...

    Cochrane Database of Systematic Reviews 2007; 4: CD002967. Santesso N, Carrasco-Labra A, Langendam M, Brignardello-Petersen R, Mustafa RA, Heus P, Lasserson T, Opiyo N, Kunnamo I, Sinclair D, Garner P, Treweek S, Tovey D, Akl EA, Tugwell P, Brozek JL, Guyatt G, Schünemann HJ. Improving GRADE evidence tables part 3: detailed guidance for ...

  24. Guidelines for the Use of Literature Reviews in Master's Theses in

    The proposed guide for conducting a quantitative systematized literature review draws on our experience using systematic reviews (Heshmati et al., 2023; Honkaniemi et al., 2017; Juárez et al., 2019, 2022, 2023), as well as on our application of the proposed 12 steps to supervise past master's and Ph.D. students who have successfully ...

  25. Varicocele repair for severe oligoasthenoteratozoospermia: Scoping

    Systematic review of the literature. This systematic review was undertaken and reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [Citation 25-27], with additional guidance from the Cochrane Handbook of Systematic Reviews and Meta-Analyses for Interventions [Citation 28]. The ...

  26. Chapter 3: Defining the criteria for including studies and ...

    Cochrane Database of Systematic Reviews 2017; 4: CD002902. Safi S, Korang SK, Nielsen EE, Sethi NJ, Feinberg J, Gluud C, Jakobsen JC. Beta-blockers for heart failure. Cochrane Database of Systematic Reviews 2017; 12: CD012897. Santesso N, Carrasco-Labra A, Brignardello-Petersen R. Hip protectors for preventing hip fractures in older people.