How to critique

Here are some pointers on the things to look for when discussing or reviewing a paper.


Generally discussants have 10 – 15 minutes to give comments on a paper, sometimes less. With that much time you can make 3 good comments. You should not use this time to say everything you liked or did not like about a paper and you should not get lost in the weeds. If you describe errors you have to get to the so what. The fact that there is an error is not in itself of interest. You should select your comments so that:

  1. they open up a conversation
  2. they speak to the major issues the paper addresses
  3. they provide pointers to how to do better.

Remember as a discussant it is not about you, it is about making the paper better and helping people understand its strengths and limitations. Mostly it’s about the speaker. If you think the paper is great you do not have to drum up a critique, but you should still try to help people see why it is great. Having slides helps organize your presentation and helps people follow. A single slide with three bullets on the three big points is enough. If you have a laundry list of smaller points, share it with the speaker afterwards.

Same language, different perspective: The really useful critiques often come from taking a really fresh perspective on  a piece of work. This requires stepping back and not becoming beholden to the author’s spinning of their findings. Often useful to figure out what this is a case of? What is the general class of phenomena this speaks to? If you had lots of resources how would you address the question? If you could set it up as an experiment how would you do it? If you really had to take a policy action based on this work, which elements would give you pause?  But as you take different perspectives you should try to speak the same language otherwise you can end up talking to yourself and influencing no one.


For a formal review or referee report you have space to go into much more depth. A standard approach is to divide these reviews into three parts.

  • The first part can be a single paragraph — it summarizes the key contribution of the paper as you see it, gives an overall assessment, and points to the key issues, concerns, or strengths. Don’t forget the strengths. Try to articulate succinctly what you know now that you didn’t know before you read the piece. Often a quick summary can draw attention to strong features you were not conscious of, or makes you realize that what you were impressed by is not so impressive after all.
  • The second part discusses 3 – 6 major features of the paper; the checklist below lists features that could be useful to think through when selecting themes. Try to organize by theme (measurement, explanation etc.).
  • The third part is for “smaller issues” where you can bullet point things from ambiguities, to estimation issues, to pointers to other work.

Other things:

  • It’s useful to authors when you can point to literature they have not read, if relevant.
  • It’s useful to authors to know what to cut: reviews tend to worry about length but still ask for more.
  • Your tone should be such that you would not feel embarrassed if someday your review gets into the public domain by mistake.
  • You should feel free to ask for extra material such as replication data or analysis plans. Sometimes reviewing can go quicker if you can access data.
  • Don’t ask the authors to ask and answer a different question; respond to the paper you have been sent.
  • Be generous: share references if they are missing but don’t assume that researchers intentionally ignored the work of others (or your work!); raise ethical issues if you see them but don’t assume researchers acted without ethical concern; ask for multiple comparisons corrections but don’t assume deliberately misleading reporting.
  • Pronouns. For anonymous review it’s usually safe to use pronouns “you” or “they” even if single authorship has been indicated.

The Checklist

Here is my list of what to look out for as I read a paper:


  • Is the theory internally consistent?
  • Is it consistent with past literature and findings?
  • Is it novel or surprising?
  • Are elements that are excluded or simplified plausibly unimportant for the outcomes?
  • Is the theory general or specific? Are there more general theories on which this theory could draw or contribute?

From Theory to Hypotheses

  • Is the theory really needed to generate the hypotheses?
  • Does the theory generate more hypotheses than considered?
  • Are the hypotheses really implied by the theory? Or are there ambiguities arising from say non-monotonicities or multiple equilibria?
  • Does the theory specify mechanisms?
  • Does the theory suggest heterogeneous effects?


  • Are the hypotheses complex? (eg in fact 2 or 3 hypotheses bundled together)
  • Are the hypotheses falsifiable?

Evidence I: Design

  • External validity: is the population examined representative of the larger population of interest?
  • External validity: Are the conditions under which they are examined consistent with the conditions of interest?
  • Measure validity: Do the measures capture the objects specified by the theory?
  • Consistency: Is the empirical model used consistent with the theory?
  • Mechanisms: Are mechanisms tested? How are they identified?
  • Replicability: Has the study been done in a way that it can be replicated?
  • Interpretation: Do the results admit rival interpretations?

Evidence II: Analysis and Testing

  • Identification: are there concerns with reverse causality?
  • Identification: are there concerns of omitted variable bias?
  • Identification: does the model control for pre treatment variables only? Does it control or does it match?
  • Identification: Are poorly identified claims flagged as such?
  • Robustness: Are results robust to changes in the model, to subsetting the data, to changing the period of measurement or of analysis, to the addition or exclusion of plausible controls?
  • Standard errors: does the calculation of test statistics make use of the design? Do standard errors take account of plausibly clustering structures/differences in levels?
  • Presentation: Are the results presented in an intelligible way? Eg using fitted values or graphs? How can this be improved?
  • Interpretation: Can no evidence of effect be interpreted as evidence of only weak effects?

Evidence III: Other sources of bias

  • Fishing: were hypotheses generated prior to testing? Was any training data separated from test data?
  • Measurement error: is error from sampling, case selection, or missing data plausibly correlated with outcomes?
  • Spillovers / Contamination: Is it plausible that outcomes in control units were altered because of the treatment received by the treated?
  • Compliance: Did the treated really get treatment? Did the controls really not?
  • Hawthorne effects: Are subjects modifying behavior simply because they know they are under study?
  • Measurement: Is treatment the only systematic difference between treatment and control or are there differences in how items were measured?
  • Implications of Bias: Are any sources of bias likely to work for or against the hypothesis tested?


  • Does the evidence support the particular causal account given?
  • Are mechanisms examined? Can they be?
  • Are there observable implications we might expect to see associated with different possible mechanisms?

Policy Implications

  • Do the policy implications really follow from the results?
  • If implemented would the policy changes have effects other thank those specified by the research?
  • Have the policy claims been tested directly?
  • Is the author overselling or underselling the findings?