The Replication crisis in science and the challenges of Prediction

(Philosophy of Science series #4-2)

In the last piece, I discussed the differences between the natural and social sciences, and the methodological challenges that flow from attempts to measure and define the object of study – something that is both qualitative and quantitative in nature. Here, I examine the replication crisis in the applied sciences (particularly psychology and medicine), inherent challenges with prediction, and suggest that there is a resolution to be had through the integration of millennia of wisdom about human nature into theoretical and predictive endeavours in the sciences.

This is the second part of the fourth instalment in the Philosophy of science series. In the links below, you can find the previous entries.

1. The Failure to Replicate

Two criteria that are crucial to the durability and empirical adequacy of scientific findings are replicability and prediction. In the social and applied sciences in particular, both are elusive for reasons inherent to the domain of inquiry.

Replicability is essential to establish general principles that accurately apply to analogous, yet unique circumstances in a given domain.

In 2005, J.P.A. Ioannidis published a now famous paper on replicability that illuminated a problem common to many fields of science.[1]

He argues, provocatively, that most published research findings are false because their conclusions are overstated and under-supported. The crisis is significant, and applies to fields as diverse as medicine, psychology, and other social sciences.

The reasons are as follows. Often, he writes, scientists claiming “conclusive research findings” are actually based solely on a “single study assessed by formal statistical significance, typically for a p-value less than 0.05.”[2]

He notes that for a field of scientific study to be reliable, the underlying body of research upon which findings are built must be solid, and well-tested. When the incidence of poorly conducted studies and conclusions derived therefrom is high, the pool of research is akin to a poisoned, rather than pristine well.

The most prevalent factors that contribute to a high incidence of weakly supported conclusions and reduce the likelihood of the research findings being true are:

Small study sizes
A weak effect size – the number measuring the strength of the relationship between variables – within a scientific field
The greater the number and the lesser the selection of tested relationships in a scientific field
The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field
The greater the financial and other interests and prejudices in a scientific field
The hotter a scientific field (with more scientific teams involved).[3]

The desire to manipulate methodology to publish ‘positive’ results is the thread running through all these factors. There is considerable leeway to manipulate circumstances when the methodology is laxly defined and enforced.

The failure to replicate research findings applies to fields as diverse as medicine, psychology, and numerous social sciences.

Other well-known tendencies in scientific study include p-hacking[4] (manipulating data to elevate a particular research finding to a level at which it is statistically significant), ‘HARKing’[5] (hypothesizing after the results are known), and insufficiently rigorous methodological formulations such that replicatory studies are not comparable.

To support his conclusion that ‘most research findings are false,’ Ioannidis grades each field of study according to whether or not the methodology prevalent in the majority of studies falls above or below the thresholds required for adequate findings in each domain. Since many fall below 50%, he concludes that most research findings are false, or at least misleading, and constitute overstatements of the findings.

To give a particular example, in psychology – especially social psychology – the factors contributing to poor replicability include poor research and statistical methods, replication standards that are not clearly defined, and a lack of a theoretical framework to set expectations and determine whether findings are (dis)confirmatory.[6] Psychological research is often of an insufficient sample size and the ‘subjects’ in question are poor comparators. Comparing people is inherently difficult, given differences in behaviour and belief that multiply with time, culture, and circumstance, further separating the features of the test subject and their environment in the initial study from those in further studies attempting to validate the initial results. The inability to meaningfully capture these differences has led to failures in attempts to replicate a study’s finding, and for many to question whether it can be done well in practice given current methods, or even in principle.

The Problems go Deeper

In addition to these stated technical problems that are to a certain extent within the realm of fixable, methodological problems internal to scientific disciplines, there are prosaic and more insurmountable reasons why the social and applied sciences do not replicate well.

First, the domain of study is immeasurably complex. Think of how different a brain scan, or psychological profile is from the intricate details of a person’s life and a personal, intimate familiarity with the subtleties of their character and mental states (i.e., the first-person, subjective relationship with an individual, and their psychology); how different the third-person accounts found in sociology or history are from an intimate knowledge of the web of relations, institutions, norms, beliefs, and history that make up the nature of a community, a state, province, or a nation; how economic or political systems compare to one another on paper, understood through economics and political science, versus how complex and nuanced they appear to someone personally and intimately familiar with the workings of just a handful of provinces, states, or countries; how could an anthropologist – especially those of the feminist or ‘critical race’ bent – claim to reasonably compare peoples and civilizations across the ages by subjecting them to the standards of the contemporary moment, in the embarrassingly ideological way in which they do?

As time goes on, significant changes occur in individuals and the groups to which they loosely belong. With each passing generation, the generalizability of a finding in social science loses its applicability, as they are only valid for a test subject and experimental environment that no longer applies.

The differences between proximate generations are great, let alone the social and historical imaginaries that separate peoples across the ages. In comparing the lives of a blacksmith in 12^th century England, a peasant in 15^th century Italy, a centurion in 4^th century Gaul, a Queen in 8^th century Syria to anyone today, entire ways of seeing, thinking, and acting separate one another.

There is also the factor problem: there are simply too many factors that play explanatory roles in any given situation that we cannot hope to take them all into account, control for and systematically understand them in a fashion that claims to be rigorously scientific. Especially when coupled with the complexity of the object domain of study.

Indeed, many variables are so complex – like personality traits, and social norms formed over many years through complex interactions – that they defy a simplistic cut and paste application of the method and spirit of the natural sciences for an adequate understanding, whether qualitative or quantitative.

Lastly, the object domain of both the natural and social sciences is necessarily holistic at some level of resolution – that is to say, the objects of study are concrete wholes. It won’t do to simply ‘control for’ differences in time, place, etc. with statistical techniques because – unlike the proponents of materialist reductionism suppose – the whole is always greater than the sum of its parts. Its existence is something that emerges from – but is not reducible to – the sum of its components.

The differences between proximate generations are great, let alone the social and historical imaginaries that separate peoples across the ages. In comparing the lives of a blacksmith in 12^th century England, a peasant in 15^th century Italy, a centurion in 4^th century Gaul, a Queen in 8^th century Syria to anyone today, entire ways of seeing, thinking, and acting separate one another. To treat experiences, circumstances, and causal factors as similar or analogous is certainly possible – and indeed done much more admirably by rigorous scholars in the humanities – but not in a way that approaches the methods employed in the natural sciences.

In other words, the context of a given time and place – things that are necessarily constitutive elements of the very identity of an object of study – are holistic, by definition. Like all complex wholes, an object of study in the social sciences is defined in a way that it is not to be understood as the sum total of factors that can be isolated and teased apart, assigning neat and tidy causal influence to each as we go.

This is not to say that we can’t come to recognize patterns, and similarities, but the reasoning is never determinate in the same way. It is often never as certain or as clear cut as researchers, practitioners, or popularizers wish, that a particular intervention will have a given effect in the social world, nor will the effect ever be as confined to the circumstances under investigation, for the ripple effect is larger when we are speaking of the social. This is true to varying degrees from medicine and psychology up to sociology and economics. However, these disciplines tend to be dominated by an ideology that views the social world like a mechanism and has inadequate views about human nature and its manipulability.

Neither is data-driven science a substitute for these difficulties, since the problem lies in the nature of the object of study itself. Individual psychological, and social phenomena are replete with value – the first-person interpretations of the significance of things, not a description of their physical properties. This is fundamentally inaccessible to the microscope, or a causal diagram. Here, explanation lies in understanding and interpretation of the realm of percepts, and concepts – the formal nature of things that are abstracted from experience, and grasped by the intellect in the mind. Explanation and understanding in this domain can be rigorous, objective, and yield real knowledge, but with the tools of logic and reflective understanding, not the methods of the natural or social sciences.

Finally, it is important to differentiate this line of argumentation from an appeal to ignorance, or gaps in explanation. It is impossible in principle to list all of the details that influence the nature of the object of study in question, and to then control for them in order to adequately compare the objects of study across time, place, and circumstance in order to arrive at law-like generalities about the nature of an object domain in an area of social science.

For that to be the case, the physical world would have to be purely mechanistic and exhibit a closed determinism. On the contrary, at present it appears as though it is merely constrained by constants, fundamental particles, and the range of potential fluctuations to which quantum-level phenomena are confined. On scientific grounds, there is support for the view that I would argue is demonstrable, philosophically – that there is freedom, chance, agency and indeterminacy at the quantum level where particles do not have a fixed state or location, only taking one on as they are actualized by causes, many of which appear to be non-physical in nature, most notably conscious observation. There is order and causal determination in the world to be sure, but not of a kind that is neatly deterministic. Rather, it is open-ended and free within a set of strong constraints.

Karl Popper’s ground-breaking work The Poverty of Historicism did a great deal to articulate the general reasons why the social and natural sciences have limited use in strongly predicting the future, although it was perhaps not the author’s intention. Their utility decreases considerably with the complexity of the subject matter, and the extent of the interventions proposed to effect a particular change.

In the context of a critique of the social engineering tendency, the Marxist historical and culturally Marxist ‘social justice’ understanding of social life and history, Popper takes a thorough look at the epistemological assumptions behind the belief in the social engineering tendency, which he argues is underpinned by an understanding of social science that is ‘historicist’.

Historicism, he defines as “an approach to the social sciences which assumes that historical prediction is their primary aim, and that this aim is attainable by discovering the ‘rhythms’ or the ‘patterns’, the ‘laws’ or the ‘trends’ that underlie the evolution of history.”[7]

First, the epistemological assumptions of social scientists who adopt this perspective are as follows. Holism, he says “is used to denote (a) the totality of all the properties or aspects of a thing, and especially of all the relations holding between its constituent parts, and (b) certain special properties or aspects of the thing in question, namely those which make it appear an organized structure rather than a ‘mere heap’.”[8]

Now, Popper is not critical of the study of wholes per se, but in the belief that they can be studied in the same way as the atomistic properties of the physical world are in the natural sciences. The latter consists in investigation achieved by isolating and focussing the inquiry on one part of the world, at the exclusion of its larger context, so as to draw out an aspect of reality, and its properties.

What he rightly criticizes is the belief that wholes (in both senses outlined above) can be studied scientifically and become the basis of a social science-informed activism. One that attempts to study the social world and intervene in a broad way so as to achieve large-scale objectives. Such a view is anything but scientific, for reasons outlined below.

Problems with the historicist view

First, it requires a scientifically rigorous description and law-like understanding of the whole of society in order to support its claims. Because historicists make predictions based on a body of knowledge, they assume that scientific knowledge can be obtained of, and apply to large parts of the social and natural world, and that their chosen interventions will lead to the desired outcomes.

Neither of which are true in a strong sense. First, the idea that there are laws, patterns or trends of large, complex social phenomena is not a matter that can be determined scientifically. Due to the scale and scope of the phenomena, there is no way to test for this in a very rigorous way, or replicate it, for one does not control the conditions of any experiment, nor are there comparators across place, time, and space on a large scale. Knowledge of social and historical phenomena can surely be had, but not in a way that is scientific.

Secondly, the claim that an intervention will have a given set of outcomes is not a scientific one. The question of the effectiveness of a policy initiative in matters related to climate, education, or poverty, for example, is something that can be known approximately, by enumerating the likelihood of what causal factors will remain or change over time, but not in a way that is rigorously scientific.

Those infused with the historicist spirit often ignore the unintended side effects of their proposed interventions, have a poor understanding of the continuation of trends, and an inadequate understanding of the constants of human nature, and the lessons of history. They tend to assume that the trends they require to be present for their model to work will continue to exist into the future. This often comes from a narrow-minded ‘progressivism’ that assumes that most novelty is good, and that good things are the result of the continuous unfolding of progressive beliefs.

Neither enjoy solid support under scrutiny. Many things that appear novel today have existed in the past in different guise and have been shown to be destructive. We see this clearly in the context of major contemporary issues and thought patterns such as environmental degradation, poverty and crime rates, class inequality, and mental health challenges – the legacy not so much of conservatism, but of unbridled liberalism.

Classical and contemporary strands of liberalism are major contributors to the belief in the unrestrained use of technology in the pursuit of economic development and the consumerist mentality, which are main drivers of environmental degradation. The advent of social media, the widespread availability of technology and the use of the internet have released vitriol, fuelling social dysfunction, and exacerbated divides in many countries, to say nothing of rises in social isolation, along with health effects like obesity, drug abuse, and domestic violence.[9] Both have been championed as unbridled goods by the prophets of progress.

The sexual revolution reimagined uncultivated and unregulated sexual activity as a positive expression of repressed subjectivity, rather than a harmful vice and addiction that enslaves the person to lust and objectification, rendering the deeper fulfilment that comes from union in a relationship and all the goods that flow from it virtually unknown to many in the post-60s generations. This has been a contributing factor to many social ills, such as high rates of divorce, single parenthood, and the concomitant effects on children of single parents[10], the worst of which include teen pregnancy for women[11], incarceration for males[12] and a significant decline in relationship and career satisfaction among women.[13]

Lastly, the worldview that emphasizes social mobility and its connection to quantitative equality and the dignity of the person has had the unintended side effect of strongly linking that dignity to position on the ladder of social status, more so than ever. This is not an ennobling, but rather an objectifying outcome. It has led to high levels of anxiety, and low self-esteem among vast swaths of the population who imbibe the message that they are only fulfilled when they are moving upward and are ‘equal’ in terms of status, ability, power and the possession of goods.

In more traditional and communitarian cultures around the world today and in the past – a world where value is tied to character development and the performance of one’s role in society – people achieve social recognition and respect from living a good life as members of their class and community, and often lead much more fulfilling lives as a result.

All this to say, the side effects of large-scale interventions in the social world are very difficult to predict, but certain things can be known from a refined, normative and philosophical understanding of human nature that has developed over the centuries.

What does this look like? In marshalling social science research like I have here, I do not attempt to commit the same fallacy I’ve alluded to in this article. It is easy to do many studies that highlight correlations, point potentially to certain findings, and so on and so forth. To interpret anything at a higher level of resolution, one must have recourse to normative knowledge about the human person, their nature, the good of individual, social and political life, over time, consistent with and balanced against a wide range of goods that exist in relationships of tension, mutual exclusivity, and complementarity at different times and places.

The world is complex, and there are so many competing explanations bandied about at any given time that it is natural to seek out the explanations that conform to preconceived notions. We often look back at history and forget how many predictions have been utterly inaccurate, when compared to successful ones. This is the progressive fallacy – the belief that good things today are the result of the ‘progressive’ changes in the past, ignoring how many of them have failed and led to harm and decline for reasons knowable in advance to average persons of wisdom and reflective understanding. Unfortunately, many of these prosaic observations, that perhaps many people will admit to privately, go altogether abandoned and rejected by opinion-makers, academics, and journalists who champion a crude understanding of ‘science’, ‘evidence-based policy’ and ‘social engineering’ in the name of ‘progress’. Yet on the contrary, neither are these lessons heeded by conservative-leaning people and parties, who often attempt to simply reapply particular policy frameworks to changed circumstances to which they are no longer suited. Those inclined to either side should recognize the principles at stake that ought to be preserved and developed, and the particular modifications that are needed to try to realize them in an ever-changing world.

3. Conclusion: Partiality and Unification

From the preceding, we may conclude of the natural sciences that they identify the efficient causes between entities and the processes that unite them with great precision, but they fall short of more ambitious claims. They cannot demonstrate the validity of their assumptions, the adequacy of their theoretical framework, or the grounding of their domain ontology – the entities that populate the model of their object domain, whether that be cells and systems in biology, particles and physical forces in physics, social classes and forces in sociology, or mental concepts in psychology.

The social sciences can adopt some of the methods and assumptions up to a point, but are more prone to overstate the scope of their findings in fundamentally inaccurate ways. This is because of a few endemic challenges.

First, they tend to conflate the meaning and value of a finding with its factual status; they often assume, rather than demonstrate the objectivity of the taxonomy of variables they employ; furthermore, they assume the normative framework from which they operate.

Secondly, in terms of methodology and processes of inference, they tend to point to the causal history of some particular process and equate it with a generalized normative understanding of the moral or aesthetic value of that same thing. As we saw in the last piece, this includes things like listing a polygenic score that predisposes a person to a particular behaviour, and saying that the behaviour is ‘natural’ and good for them, or in the case of a qualitative state – say, happiness – that it is caused by, and coextensive with that set of genes. This gives the impression that the analysis is ‘scientific,’ when in fact the conclusions do not follow, nor are they argued for, but assumed.

Third, the link to practical decision-making leads to a blurred boundary between the validity of findings and the validity of the ends to which they are put. The label ‘science’ is misapplied in order to command authority and legitimize a given course of action, where in fact it only weakly applies to the study of processes and efficient (material agent) causes.

Lastly, the ignorance of holism in its many forms leads to overstatement and outright misunderstanding of complex phenomena, often through truncation and oversimplification of the concepts. Definitions of concepts and factors in theoretical models are often taken to be easily identified in deceptively neat and tidy terms where they are in fact more complex. The holistic context of the social setting (historical and cultural elements) in which every experiment is embedded is often ignored.

A common feature of both natural and social science is the inherent presence of values, through and through. Practitioners and lay people alike may not appreciate that while not disqualifying sciences as objective, their objectivity is found in logic, and philosophical reasoning at many different levels.

The difficulty of a reductive translation between physics and chemistry, biology and social science and of reducing psychological ‘happiness’ to brain states and therapeutic techniques does not reflect a need for more ‘data’, or a ‘theory of everything’ from physics, but stems from a refusal to recognize that some questions, by their very nature, admit of a certain kind of answer.

That answer, which is a kind of unified explanation is something to be articulated through a combination of the tools of reason, logic, understanding and judgment – everyday practices that people engage in.

We are better off using our scientific understanding of the causal factors that influence phenomena in various domains of study to inform us of factors that have the potential to influence a particular outcome, rather than the only tools that we use in trying and ultimately failing to exhaustively answer a question.

To understand the possibilities of specific interventions and their chances for success, it is best to start with the wisdom about the nature of the human person at the individual and social levels, as built up over millennia, combined with our more immediate qualitative understanding of the circumstances of the present and the advances in social and natural science. More on this to come.

[1] John P. A. Ioannidis, “Why Most Published Research Findings Are False,” PLoS Medicine 2, no. 8 (August 30, 2005): e124, https://doi.org/10.1371/journal.pmed.0020124.

[2] Ioannidis.

[3] Ioannidis.

[4] G. Davey Smith, “Data Dredging, Bias, or Confounding,” BMJ 325, no. 7378 (December 21, 2002): 1437–38, https://doi.org/10.1136/bmj.325.7378.1437.

[5] Norbert L. Kerr, “HARKing: Hypothesizing After the Results Are Known,” Personality and Social Psychology Review 2, no. 3 (August 1998): 196–217, https://doi.org/10.1207/s15327957pspr0203_4.

[6] Michael Muthukrishna and Joseph Henrich, “A Problem in Theory,” Nature Human Behaviour 3, no. 3 (March 2019): 221–29, https://doi.org/10.1038/s41562-018-0522-1.

[7] Karl R. Popper, The Poverty of Historicism, repr, Routledge Classics (London: Routledge, 2002), 3.

[8] Popper, 76.

[9] Sherry Turkle’s work has been particularly illuminating about the impacts of technology on isolation, mental health and weakening communities. Sherry Turkle, Reclaiming Conversation: The Power of Talk in a Digital Age (New York: Penguin Press, 2015).

[10] Jane Anderson, “The Impact of Family Structure on the Health of Children: Effects of Divorce <sup/>,” The Linacre Quarterly 81, no. 4 (November 2014): 378–87, https://doi.org/10.1179/0024363914Z.00000000087.

[11] Siu Kwong Wong, “The Effects of Single-Mother and Single-Father Families on Youth Crime: Examining Five Gender-Related Hypotheses,” International Journal of Law, Crime and Justice 50 (September 2017): 46–60, https://doi.org/10.1016/j.ijlcj.2017.04.001.

[12] Cynthia C. Harper and Sara S. McLanahan, “Father Absence and Youth Incarceration,” Journal of Research on Adolescence 14, no. 3 (September 2004): 369–97, https://doi.org/10.1111/j.1532-7795.2004.00079.x.

[13] Betsey Stevenson and Justin Wolfers, “The Paradox of Declining Female Happiness” (Cambridge, MA: National Bureau of Economic Research, May 2009), https://doi.org/10.3386/w14969.

Peter Copeland

Philosophy, Faith, Public Policy and Culture

4 comments

Leave a comment Cancel reply