Science is brutal.

It doesn’t care about you or me or anybody. Science is a killer, laying waste to our pet theories and dispatching our grandest ideas. Sometimes I hate science—that fucker doesn’t respect all my hard work!

 

Then I wake up, and am so grateful that science dispatches so indiscriminately. I love science. Without it, we’d still believe our planet was flat; that all matter consisted of only earth, water, air, or fire; that trepanning the skull could improve subjective wellbeing; or that phlogiston is released when materials are burned.

We might also still believe that as little as one-minute of effort could have measurable downstream consequences on cognitive control.

That last one stings a little. But the results of a massive replication effort, involving 24 labs (or 23, depending on how you count) and over 2,000 participants, indicates that short bouts of effortful control had no discernable effects on low-level inhibitory control. This seems to contradict two decades of research on the concept of ego depletion and the resource model of self-control. Like I said: science is brutal.

Not everyone agrees with my assessment. Some have suggested that the theory is good and sound and healthy, but that the massive replication study was not a fair test of the theory. The chief complaint is that the study did not operationalize variables correctly, and is thus deeply flawed.  This, despite the study being a direct replication of an already-published ego depletion study and pre-approved as a fair test by not only the leading authority on ego depletion on the planet, but also by many senior scientists with years of ego depletion research under their belts. Many of these experts, myself included, not only thought the replication was appropriate, but also expected the replication would yield positive results. I would have bet money on the replication confirming ego depletion, albeit with a much smaller effect than the published literature would have us believe.

The specifics of the operationalization complaint are straightforward. The replication study operationalized initial bouts of effortful control with the much-used crossing out ‘e’s task, whereby participant are asked to cross out (via button press) words that contained the letter ‘e’ (e.g., “then”), but to withhold crossing out if the ‘e’ was next to or one letter away from a vowel (e.g., “theta”). They did this for about seven and a half minutes. The rub is that although a version of this task has been widely used, the more usual way of administering the task—including in the first ever ego depletion paper—involves having participants first entrained with the habit of crossing out all ‘e’s for a few moments before having them selectively cross out only some ‘e’s. The idea here is that the initial habit-forming stage makes the crossing-out ‘e’s step that follows more effortful.

I am not completely unsympathetic to this bit of ad-hockery. If initial bouts of effort meaningfully reduce subsequent control, then there ought to be a dose-response relationship, such that the more effort one expends initially, the more control is reduced subsequently. So, if the modified crossing out ‘e’s task used in the replicated source paper (and in the current registered replication) is less effortful than the usual crossing out ‘e’s task, it makes sense that self-control would be less reduced.

The problem is that the depletion literature is littered with initial effortful tasks that are as effortful—and often, much less effortful—that the ones used in the replication. First, the registered replication was, well, a direct replication of a published depletion paper using this precise task; this version of the crossing out ‘e’s task has also been used in at least one other published paper. If this initial task is not sufficiently effortful to evoke downstream consequences on control, how did these two publications find what they did? Second, even a casual glance at the depletion literature reveals initial effortful tasks that appear no more effortful than the initial task used in the replication. These include letting your mind wander for a few moments on any topic save for white bears, having a structured conversation with a Black confederate, writing a short paragraph without using the letters ‘A’ or ‘N’, recalling a time when one was a victim of prejudice, or taking the perspective of a hungry waiter unable to eat food.

Remarkably, I know of two papers where the initial effortful task involved performing twenty incongruent Stroop trials. Having conducted many many Stroop studies, I can assure you that twenty Stroop trials, which might take less than one minute, requires noticeably less effort than the initial task used in the registered replication; yet, these studies had significant downstream consequences.

So, while I agree that the original crossing-out ‘e’s task would have been preferable, there is no principled reason to think that the task is not effortful enough to produce depletion. It is for this reason that all us experts signed off on this replication and thought it was a fair test. It is for this reason that 23 out of 24 labs predicted a significant result.

It is also for this reason that 23 out of 24 labs, including my own, need to update our beliefs.

So how do we update and where do we go from here? As is the case when we fail to reject the null, knowing how to proceed is tricky. I do not think it is justified to say that the resource model is dead or that it has been debunked. Yes, I know that a few paragraphs ago I drew parallels between ego depletion and phlogiston, but I have not genuinely updated my beliefs to that degree; at least not yet.

To be clear, I think there are multiple sources of uncertainty about the robustness of the ego depletion phenomenon, not just this massive replication. These include bias-corrected meta-analyses of the depletion literature that cannot resolve whether the overall depletion effect is different from zero or not; a growing number of smaller failures to replicate; and a series of unresolved theoretical questions about the resource model itself. I should also note that this replication is the only pre-registered*, confirmatory study that I am aware of, as well as being the largest ever lab-based depletion study. It is thus far more substantial than a regular study and should be given considerably more weight.

Even with all this, I have not yet updated my beliefs to such an extent that I am ready to abandon the lab-based phenomenon altogether. Maybe I’m fooling myself, but I think I need a few more pre-registered, high-powered failures to replicate before I do that. Yeah, I'm a sucker.

I have nonetheless updated my beliefs. Before this replication, I would advise people to use cognitive reaction-time studies like the Stroop as their main dependent variables. I love the robustness of these tasks. But, as the massive replication used a Stroop-like-variant as a dependent variable, I no longer think this is wise. Various bias-correction techniques converge on this point as well. I also think that we have been fooling ourselves when we thought that as little as one, five, or even ten minutes of effort could have meaningful downstream consequences. Our cognitive machinery is simply too resilient for that. My current thinking is that we should aim for initial tasks that get people to exert effort for 30 or 45 minutes, maybe even more. Researchers who study fatigue in the lab often get their participants to work for upward of two hours.

Moving forward, I think we need to tighten our methods and think more precisely. I think we need to do a better job of establishing convergent validity, aligning depletion with other related and perhaps isomorphic constructs. Instead of trying to establish a unique identity, jangling away an ego depletion brand, we should ally depletion with clearly related concepts, such as fatigue and even boredom. The study of fatigue, in particular, has a long history; and many of the same ideas that we have floated about depletion today have been discussed and resolved by scholars of fatigue decades ago.

I also think we need to complement lab research by moving beyond the strict confines of the sequential task paradigm. When we do, we might conceive of depletion as operating over longer bouts of time, think multiple hours not minutes. My new favorite depletion/fatigue study was done in the field, involving over 4,000 caregivers and over 13 million data points, showing that compliance with externally-mandated rules (hand-washing) dropped as a function of how long into their 12-hour shift the caregivers had been working. This is powerful stuff.

When I first learned about the results of the pre-registered replication a few months ago, I was gutted. It forced me to think deeply about the state of our field, to reassess, and to ask difficult questions. I don’t like the results of this replication study. I don’t think any one of the authors likes the results of this replication study.

But, man, am I happy we ran this replication study. It offers great lessons, if we are willing to hear them.

----------------------------

* It turns out the RRR was the second pre-registered ego depletion study; the first pre-registered study was just recently published; it too returned a null effect. Thanks to Michael Kane for pointing this out to me.

13 Comments