Thursday, March 25, 2010

A Checklist for Critically Reading Quantitative Research

The following is an attempt to create a heuristic for evaluating the quality of quantitative social scientific research with zero assumed background in social scientific methods.

1. Is the information contained solely in a press release and/or interview or is a report or article in a scholarly journal available? If a report or scholarly journal, was the research reviewed by knowledgeable peers?

2. Is the basic methodological information required by the American Association for Public Opinion Research's Standards for Minimal Disclosure available, either in the report or a methodological appendix?

a. Who sponsored the survey, and who conducted it. Commentary: Search for information on the researcher and the company that actually carried out the data collection, if different. Have either been cited by AAPOR for violating its Code of Professional Ethics and Practices (e.g., )? Alternately, have company principals held posts in professional associations like AAPOR (c.f. Tom Smith, Mark Shulman) and/or publish methodological articles and give presentations at meetings like the American Statistical Association? Does the research firm used primarily conduct political polls, market research, or academic research? Naturally, look for commentaries on the specific piece of research. Where research is sponsored by an organization with an agenda that might influence research (i.e., just about all Jewish communal research), does the researcher explain the nature of their relationship with the sponsor?

b. The exact wording of questions asked, including the text of any preceding instruction or explanation to the interviewer or respondents that might reasonably be expected to affect the response. Commentary: Are the questions worded in a way that might bias the answer? If the results are very different to previous research, has the author used standard questions? If not, do they justify their items? Are there any skip patterns that fail to collect information from important populations? If the researcher combines items to form a scale or index, does s/he explain exactly how it was done? Does s/he report measures of scale reliability (e.g., Cronbach’s/coefficient alpha > .75)? Does the index/scale actually seem to reflect what the researcher says it does? Quite often, it doesn’t.

c. A definition of the population under study, and a description of the sampling frame used to identify this population. Commentary: Is there any reason to think that the sample might differ systematically from the population it is intended to represent? Was it a convenience sample (a.k.a. open sample)? Was it from an opt-in Internet panel? Does the sample just constitute Jews by religion? If there is a reasonable probability of bias, is it likely to have increased or decreased the reported results of the study? Does the author address any possible limitations of the sample?

d. Sample sizes and, where appropriate, eligibility criteria, screening procedures, and response rates computed according to AAPOR Standard Definitions. At a minimum, a summary of disposition of sample cases should be provided so that response rates could be computed. Commentary: The lack of a response rate is a major red flag. The response rate should be accompanied by a specification of the exact response rate formula used (e.g., AAPOR RR3). Better research will not only list the response rate but will address possible biases.

e. A discussion of the precision of the findings, including estimates of sampling error, and a description of any weighting or estimating procedures used. Commentary: If the survey uses an opt-in Internet panel or a convenience sample, estimates of sampling error are inappropriate and should never be reported. Discussion of weights may distinguish between design weights and poststratification weights (which correct for biases). If poststratification weights are used, the discussion should specify what variables were used and where how targets for adjustment were derived.

f. Which results are based on parts of the sample, rather than on the total sample, and the size of such parts. Commentary: If analyses exclude parts of the sample, is there a reasonable justification for their exclusion?

g. Method, location, and dates of data collection. Commentary: How long was the field period? How many contact attempts were made? Did the researcher try to convert refusals? Overall, does it appear that sufficient effort was put into collecting the data?

3. If the researcher describes something as a well-established fact, is it in fact widely supported? Often, it is not. Do a quick fact-check.

4. Does the researcher actually present evidence that directly supports their conclusions? Is there any evidence in the research that contradicts the researcher’s assertions? (This is surprisingly common.) Look carefully for situations where some related data is shown, “hand-waving” takes place (i.e., unsupported assertions are made), and a conclusion is stated definitively.

a. For processes involving changes over time, is there actually evidence of change over time or does the researcher base her/his analyses on differences between age groups? If so, can one reasonably expect there to be lifecycle effects?

b. For processes involving individuals, does the researcher base their conclusions on aggregated data?

5. Does the researcher omit any relevant outcomes? First, were there questions asked that aren’t reported? Second, were any topics simply not asked about? Is there a reason to expect that the omitted topics might diverge from the reported results?

6. If the researcher asserts that X caused Y, does s/he control for other factors that might be expected to influence the outcome?

a. Simple bivariate analyses are far more prone to biases than are regression analyses with suitable controls, which tend to be quite robust, even for biased samples.

b. How important are exact estimates to the researcher’s findings? The more research depends on a specific figure (e.g., the number of Jews in the United States), the more vulnerable it is to shortcomings in data collection.

c. Did the researcher omit any relevant explanatory variables? First, were there any topics asked about that weren’t included in the analysis? (Typically, where a variable is not significant, a researcher will say something like “X, Y, and Z did not have a significant effect and are omitted; model not shown.”) Second, were there any potentially relevant explanatory variables omitted? If so, what explanatory variables included in the analysis might be picking up the omitted variables’ effects?

d. Does the researcher report whether effects were statistically significant? This is especially important where sample sizes are small. (Note that statistical significance isn’t appropriate in cases where all relevant units were surveyed, like many Birthright reports, or for convenience and other nonprobability samples.)

e. If the sample size is very large, are the effects reported large enough to be meaningful?

7. For evaluation research on the effectiveness of a program or policy:

a. How does the researcher estimate the program effect? In declining order of rigor, these are:

i. True experiments that randomly allocate individuals or other units like communities into treatment or control groups.

ii. Quasi-experimental designs that have nonrandom treatment and control groups and measure outcomes before and after the treatment intervention begins. Does the researcher document whether systematic differences exist between treatment and control groups? If there are differences, what steps does the researcher take to take account of these?

iii. Quasi-experimental designs that either measure participants before and after the treatment but do not have a control group or measure treatment and control groups only after intervention. For participant-only pre/post designs, are there other factors such as aging or a major event like a terrorist attack that could also explain the results? For treatment and control post-only designs, does the researcher model the characteristics of the groups and control for them when analyzing outcomes.

iv. Treatment-only post-only designs that ask retrospective questions about attitudes and behavior. Are the events being recalled memorable? Is there a reason to believe that respondents may have recall errors, like telescoping events?

v. Treatment-only post-only designs that ask about program satisfaction and self-perceptions of program effect. These are very weak.

b. Does the researcher generalize about the program’s effect beyond the type of people who actually participated? For instance, does a study of a program of outreach to intermarried Reform synagogue members claim that the program will work for all intermarried families? Is this a reasonable assumption?