Posted on

Corey Dethier – “Interpreting the Probabilistic Language in IPCC Reports”

A young sibyl (sacred interpreter of the word of god in pagan religions) argues with an old prophet (sacred interpreter of the word of god in monotheistic religions). It looks as if the discussion will go on for a long while.
Detail of “A sibyl and a prophet” (ca. 1495) Andrea Mantegna

In this post, Corey Dethier discusses his article recently published in Ergo. The full-length version of Corey’s article can be found here.

Every few years, the Intergovernmental Panel on Climate Change (IPCC) releases reports on the current status of climate science. These reports are massive reviews of the existing literature by the most qualified experts in the field. As such, IPCC reports are widely taken to represent our best understanding of what the science currently tells us. For this reason, the IPCC’s findings are important, as is their method of presentation.

The IPCC typically qualifies its findings using different scales. In its 2013 report, for example, the IPCC says that the sensitivity of global temperatures to increases in CO2 concentration is “likely in the range 1.5°C to 4.5°C (high confidence), extremely unlikely less than 1°C (high confidence) and very unlikely greater than 6°C (medium confidence)” (IPCC 2013, 81).

You might wonder what exactly these qualifications mean. On what grounds does the IPCC say that something is “likely” as opposed to “very likely”? And why does it assign “high confidence” to some claims and “medium confidence” to others? If you do wonder about this, you are not alone. Even many of the scientists involved in writing the IPCC reports find these qualifications confusing (Janzwood 2020; Mach et al. 2017). My recent paper – “Interpreting the Probabilistic Language in IPCC Reports” – aims to clarify this issue, with particular focus on the IPCC’s appeal to the likelihood scale.

Traditionally, probabilistic language such as “likely” has been interpreted in two ways. On a frequentist interpretation, something is “likely” when it happens with relatively high frequency in similar situations, while it is “very likely” when it happens with a much greater frequency. On a personalist interpretation, something is “likely” when you are more confident that it will happen than not, while something is “very likely” when you are much more confident.

Which of these interpretations better fits the IPCC’s practice? I argue that neither of them does. My main reason is that both interpretations are closely tied to specific methodologies in statistics. The frequentist interpretation is appropriate for “classical” statistical testing, whereas the personalist interpretation is appropriate when “Bayesian” methods are used. The details about the differences between these methods do not matter for our present purposes. My main point is that climate scientists use both kinds of statistics in their research, and since the IPCC’s report reviews all of the relevant literature, the same language is used to summarize results derived from both methods.

If neither of the traditional interpretations works, what should we use instead? My suggestion is the following: we should understand the IPCC’s use of probabilistic terms more like a letter grade (an A or a B or a C, etc.) than as strict probabilistic claims implying a certain probabilistic methodology.

An A in geometry or English suggests that a student is well-versed in the subject according to the standards of the class. If the standards are sufficiently rigorous, we can conclude that the student will probably do well when faced with new problems in the same subject area. But an A in geometry does not mean that the student will correctly solve geometry problems with a given frequency, nor does it specify an appropriate amount of confidence that you should have that they’ll solve a new geometry problem. 

The IPCC’s use of terms such as “likely” is similar. When the IPCC says that a claim is likely, that’s like saying that it got a C in a very hard test. When the IPCC says that sensitivity is “extremely unlikely less than 1°C”, that’s like saying that this claim fails the test entirely. In this analogy, the IPCC’s judgments of confidence reflect the experts’ evaluation of the quality of the class or test: “high confidence” means that the experts think that the test was very good. But even when a claim passes the test with full marks, and the test is judged to be very good, this only gives us a qualitative evaluation. Just as you shouldn’t conclude that an A student will get 90% of problems right in the future, you also shouldn’t conclude that something that the IPCC categorizes as “very likely” will happen at least 90% of the time. The judgment has an important qualitative component, which a purely numerical interpretation would miss.

It would be nice – for economists, for insurance companies, and for philosophers obsessed with precision – if the IPCC could make purely quantitative probabilistic claims. At the end of my paper, I discuss whether the IPCC should strive to do so. I’m on the fence: there are both costs and benefits. Crucially, however, my analysis suggests that this would require the IPCC to go beyond its current remit: in order to present results that allow for a precise quantitative interpretation of its probability claims, the IPCC would have to do more than simply summarize the current state of the research. 

Want more?

Read the full article at https://journals.publishing.umich.edu/ergo/article/id/4637/.

References

  • IPCC (2013). Climate Change 2013: The Physical Science Basis. Fifth Assessment Report of the Intergovernmental Panel on Climate Change. Thomas F. Stocker, Dahe Qin at al. (Eds.). Cambridge University Press.
  • Janzwood, Scott (2020). “Confident, Likely, or Both? The Implementation of the Uncertainty Language Framework in IPCC Special Reports”. Climatic Change 162, 1655–75.
  • Mach, Katharine J., Michael D. Mastrandrea, at al. (2017). “Unleashing Expert Judgment in Assessment”. Global Environmental Change 44, 1–14.

About the author

Corey Dethier is a postdoctoral fellow at the Minnesota Center for Philosophy of Science. He has published on a variety of topics relating to epistemology, rationality, and scientific method, but his main research focus is on epistemological and methodological issues in climate science, particularly those raised by the use of idealized statistical models to answer questions about climate change.

Posted on

Naftali Weinberger – “Signal Manipulation and the Causal Analysis of Racial Discrimination”

picture of fragmented parts of a Caucasian woman's face rearranged and surrounded by pearls.
“Sheherazade” (1950) René Magritte

In this post, Naftali Weinberger discusses the article he recently published in Ergo. The full-length version of Naftali’s article can be found here.

After the first presidential debate between Hillary Clinton and Donald Trump, the consensus was that Clinton came out ahead, but that Trump exceeded expectations. Some sensed sexism, claiming: had Trump been a woman and Clinton a man, there’s no way observers would have thought the debate was even close, given the difference between the candidates’ policy backgrounds.

How could we test this hypothesis? Some professors at NYU staged a play with the candidates’ genders swapped. A female actor played Trump and imitated his words and gestures, and a male actor played Clinton. Afterwards, participants were given a questionnaire. Surprisingly, audience members disliked male Clinton more than observers of the initial debate disliked the original. “Why is he smiling so much?”, some asked. And: “isn’t he a bit effeminate?”

Does this show there was no sexism? Here we need to be careful. Smiling is not gender-neutral, since norms for how much people are expected to smile are themselves gendered. So perhaps we need to rerun the experiment, and change not just the actors’ genders, but also modify the gestures in gender-conforming ways such that male Clinton smiles less. The worry is that the list of required modifications might be open-ended. The public persona Clinton has developed over the last half century is not independent of her gender. If we start changing every feature that gets interpreted through a gendered lens, we may end up changing all of them. 

This example illustrates how tricky it can be to test claims about the effects of demographic variables such as gender and race. I wrote “Signal Manipulation and the Causal Analysis of Racial Discrimination”, because I believe it is crucial to be able to empirically test at least some claims about discrimination, and that causal methods are necessary for doing so.

Studying racial discrimination requires one to bring together research from disparate academic areas. Whether race can be treated as a causal variable falls within causal inference. What race is, is a question for sociologists. Why we care specifically about discrimination against protected categories such as race is a matter for legal theorists and political philosophers.

Let’s start with whether race can be causal. Causal claims are typically tested by varying one factor while keeping others fixed. For instance, in a clinical trial one randomly assigns members to receive either the drug or the placebo. But does it make sense to vary just someone’s race or gender, while keeping everything else about them fixed?

This concern is often framed in terms of whether it is possible to experimentally manipulate race, and some claim that all causal variables must be potentially manipulable. I argue that manipulability is not the primary issue at stake in modeling discrimination. Rather, certain failures of manipulability point to a deeper problem in understanding race causally. Specifically, causal reasoning involves disentangling causal and merely evidential relevance: Does taking the drug promote recovery, or is it just that learning someone took the drug is evidence they were likely to recover (due to being healthier initially)? If one really could not change someone’s race without changing everything about them, the distinction between causal and evidential relevance would collapse.

We now turn to what race is. A key debate concerns whether it is biologically essential or socially constructed. Some think that race is non-manipulable only if it is understood biologically. Maya Sen and Omar Wasow argue that race is a socially constructed composite, and that even though one cannot intervene on the whole, one can manipulate components (e.g. dialect). Sen and Wasow do not theorize about the relationship between race and its components, and I believe this is by design. The underlying presupposition is that if race is constructed, it is nothing over and above the components through which it is socially mediated.

Yet race’s being socially constructed does not entail that it reduces to its social manifestations. To give Ron Mallon’s example: a dollar’s value is socially constructed, but this does not entail that there is nothing more to being a dollar than being perceived as one. Within our socially constructed value system, a molecule-for-molecule perfect counterfeit is still a counterfeit. The upshot of this is that even if race is a composite such that we can only manipulate particular components, it does not follow that race just is its components. The relationship between social construction and manipulability is more nuanced than has been presupposed.

Finally, how does the causal status of race connect to legal theories of discrimination? Discrimination law only makes sense given a distinction between discrimination on the basis of protected categories and mere arbitrary treatment. An employer who does not hire someone because the applicant simply annoys them might be irrational, but is not violating discrimination law. I argue that in order to distinguish between racial discrimination and arbitrary treatment, we need to be able to talk about whether race itself made a difference. This involves varying it independently of other factors and thus modeling it causally.

Where does this leave us with Clinton and Trump? I’d suggest that if we really can’t change Clinton’s perceived gender without changing everything about her, we cannot disentangle causal from evidential relevance, and causal reasoning does not apply. Fortunately, not all cases are like this. In audit studies, one can change a racially relevant cue (such as the name on a resume) to plausibly change only the racial information the employer receives. And this does not entail that race is only the name. Instead of asking whether race is a cause, we should ask when it is fruitful to model race causally, with a spectrum from cases like audit studies (in which it is) to cases like Clinton’s (in which it isn’t).  And even in audit studies, treating race as separable is an idealization, since one does not model it in all of its sociological complexity. If what I argue in the article is correct, however, this modeling exercise is indispensable for legally analyzing discrimination and designing interventions to mitigate it.

Want more?

Read the full article at https://journals.publishing.umich.edu/ergo/article/id/2915/.

About the author

Naftali Weinberger is a scientific researcher at the Munich Center for Mathematical Philosophy. His work concerns the use of causal methodology to address foundational questions arising in the philosophy of science as well as questions arising in particular sciences, including: biology, psychometrics, neuroscience, and cognitive science. He has two primary research projects – one on causation in complex dynamical systems and another on the use of causal methods for the analysis of racial discrimination. He is currently trying to convince philosophers that causal representations are implicitly relative to a particular time-scale and that it is therefore crucial to pay attention to temporal dynamics when designing and evaluating policy interventions.