AMSTAR: Assessing methodological quality of systematic reviews

Shea, B.J., Grimshaw, J.M., Wells, G.A., Boers, M., Andersson, N., Hamel, C., & Bouter, L.M. (2007). Development of AMSTAR: A measurement tool to assess the methodological quality of systematic reviews. BMC Medical Research Methodology, 7(10). doi: 10.1186/1471-2288-7-10

Description

The tool is an 11-item questionnaire that can be used to assess the methodological quality of systematic reviews by assessing the presence of:

  • an a priori design;
  • duplicate study selection and data extraction;
  • a comprehensive literature search;
  • the use of status of publication as an inclusion criteria;
  • a list of included/excluded studies;
  • characteristics of included studies;
  • documented assessment of the scientific quality of included studies;
  • appropriate use of the scientific quality in forming conclusions;
  • the appropriate use of methods to combine findings of studies;
  • assessment of the likelihood of publication bias; and
  • documentation of conflict of interest.

The AMSTAR tool was created by building on previous tools, empirical evidence and expert consensus. Over a decade has passed since the initial development of these types of tools, and more research has been conducted about potential sources of bias in systematic reviews. AMSTAR has incorporated these other sources of bias and will remain a "living document" subject to improvement as further advances in methodological research occur.

The instrument was developed using the following:

  1. the enhanced Overview Quality Assessment Questionnaire (OQAQ) by Oxman and Guyatt (1991)
  2. a checklist created by Sacks et. al (1987)
  3. three additional items recently judged to be of methodological importance:
    • language restriction
    • publication bias
    • publication status (inclusion of grey literature)

The tool was applied to 99 paper-based and 52 electronic systematic reviews. Exploratory factor analysis was used to identify underlying components. Methodological experts considered the results using a nominal group technique aimed at item reduction and design of an assessment tool with face and content validity.

Steps for Using Method/Tool

Users may print off a copy of the 11-item tool and use it to critically appraise a systematic review. This tool enables the user to qualitatively assess the quality of a systematic review.

AMSTAR determines the methodological quality of systematic reviews by assessing the presence of:

  • an a priori design;
  • duplicate study selection and data extraction;
  • a comprehensive literature search;
  • the use of status of publication as an inclusion criteria;
  • a list of included/excluded studies;
  • characteristics of included studies;
  • documented assessment of the scientific quality of included studies;
  • appropriate use of the scientific quality in forming conclusions;
  • the appropriate use of methods to combine findings of studies;
  • assessment of the likelihood of publication bias; and
  • documentation of conflict of interest.

Note: One study by another group of authors (Kung et al., 2010) has developed a revised tool called "R-AMSTAR" that uses a quantitative scoring method to assess the quality of systematic reviews.

Evaluation

Two reliability and validity evaluations of the AMSTAR tool have been conducted. Both evaluation studies include authors from the AMSTAR tool development study (Shea, Grimshaw, et al., 2007):

1) Shea, B.J., Bouter, L.M., Peterson, J., Boers, M., Andersson, N., et al. (2007). External Validation of a Measurement Tool to Assess Systematic Reviews (AMSTAR). PLoS ONE, 2(12): e1350. doi:10.1371/journal.pone.0001350

  • AMSTAR was used to appraise 42 reviews focused on therapies for gastro-esophageal reflux, peptic ulcer disease and other acid-related diseases.
  • Two assessors applied the AMSTAR to each review.
  • Two other assessors, plus a clinician and/or a methodologist, independently applied a global assessment to each review.
  • Reported outcomes included reliability (inter-observer kappas) and construct validity.

2) Shea, G.J., Hamel, C., Wells, G.A., Bouter, L.M., Kristjansson, E., Grimshaw, J,. Henry, D.A., Boers, M. (2009). AMSTAR is a reliable and valid measurement tool to assess the methodological quality of systematic reviews. Journal of Clinical Epidemiology. Oct; 62(10):1013-20. Epub 2009 Feb 20.

  • Thirty systematic reviews were randomly selected from a database of 151 reviews that were used in the development of AMSTAR.
  • Each review was assessed by two reviewers using the following:

1. the Overview of Quality Assessment Questionnaire (OQAQ)
2. Sack's instrument
3. AMSTAR

  • Reported outcomes included reliability (inter-observer kappas), intra-class correlation coefficients of the sum scores, construct validity and completion times.

Validity

The following validity properties have been assessed:

1) Face validity—expert review for appropriateness; see method/tool development below

2) Content validity—extent to which a measure represents all facets of a given social concept; see method/tool development below

3) Construct (convergent) validity—ability of an instrument to measure an abstract concept or construct; assesses the overlap between two or more tests that presumably measure the same construct

In the 2009 Shea et al. study, construct validity was assessed by converting the mean total score of each of the 30 reviews to a percentage of the maximum score for each of the three instruments (AMSTAR, OQAQ and Sacks et al.). Intra-class correlation assessed convergence of the total scores between each pair of instruments (AMSTAR-OQAQ, AMSTAR-Sacks and OQAQ-Sacks). Similarly, in the 2007 Shea et al. study, construct validity was assessed by comparing AMSTAR with a global assessment tool.

Reliability

The AMSTAR tool was found to have high inter-rater reliability by measuring the kappa statistic in both evaluation studies (Shea, Bouter, et al., 2007; Shea, Hamel, et al., 2009). The kappa statistic measures the level of agreement between two observers that could be expected by chance. Kappa scores > 0.8 are considered to be almost perfect agreement.

In both evaluation studies, kappa scores ranged from moderate to almost perfect agreement for AMSTAR. In the 2007 Shea et al. study, nine out of 11 items scored a kappa > 0.75 and the overall scores had a kappa of 0.84. The 2009 Shea et al. study had an average kappa score of 0.70 for inter-rater agreement for individual items.

These summaries are written by the NCCMT to condense and to provide an overview of the resources listed in the Registry of Methods and Tools and to give suggestions for their use in a public health context. For more information on individual methods and tools included in the review, please consult the authors/developers of the original resources.

We have provided the resources and links as a convenience and for informational purposes only; they do not constitute an endorsement or an approval by McMaster University of any of the products, services or opinions of the external organizations, nor have the external organizations endorsed their resources and links as provided by McMaster University. McMaster University bears no responsibility for the accuracy, legality or content of the external sites.

Have you used this resource? Share your story!