Data Analysis: An Essential Skill for the Legal Community

Bhumika Indulia

2 years ago

In November of the year 1999, an English Solicitor named Sally Clark was convicted on two charges of murder, and sentenced to life imprisonment. This tragic case is notable for many reasons — one of those reasons was the fact that her alleged victims were her own sons. Another was the fact that both were toddlers when they died.

The cause of death in both cases was initially attributed to sudden infant death syndrome (SIDS), also known as cot death in the United Kingdom. We did not know then, and do not know until this day, about the specific causes of SIDS. But suspicion grew on account of the fact that two children from the same family had died due to unspecified causes, and shortly after the death of her second child, Sally Clark was arrested, tried and convicted.

One of the clinching pieces of evidence was expert testimony provided by the paediatrician Professor Sir Roy Meadow. He put the odds of two children from the same family dying of SIDS at 1 in 73 million — in other words, an all but impossible eventuality. On the back of this testimony, and others, Sally Clark was convicted of the crime of murdering her own sons, and sent to prison for life.

One cannot help but ask the question: how did Sir Roy Meadow arrive at this number of 1 in 73 million? Succinctly put, here is the theory: for the level of affluence that Sally Clark’s family possessed, the chance of one infant dying of SIDS was 1 in 8543. This was simply an empirical observation. What then, were the chances that two children from the same family would die of SIDS?

The answer to this question, statisticians tell us, depends on whether the two deaths are independent of each other. If one assumes that they are, then the probability of two deaths in the same family is simply the multiplicative product of the two probabilities. That is, 1 in 8543 multiplied by itself, which is 1 in 73 million and that would be enough to convince any “reasonable man” that the deaths were deliberate and could not have been just coincidence.

But on the other hand, if the two events are not independent of each other — say, for example, that there are underlying genetic or environmental reasons that we simply are not aware of just yet — then it is entirely possible that multiple children from the same family may die of SIDS. In fact, given a SIDS death in a family, research shows that the likelihood of a second SIDS death goes up.^[1]

Sally Clark’s convictions were overturned on her second appeal, and she was released from prison. She died four years later due to alcohol poisoning.

We live in a world that generates, captures and analyses an impossibly large amount of data. Everybody, right from the shop that sells you a cup of tea to the Government collects data about you, shares it with other entities, and each of them builds up a “profile” about you that is then used to arrive at decisions about you^[2].

These decisions can range from the seemingly mundane (what percentage discount is likely to make you want to buy a particular good) to the tragically devastating, as Sally Clark’s example makes all too clear. And as Sally Clark’s example shows us, if there is a flaw in the decision-making process, the implications can change your life forever.

For this reason, becoming more familiar with data and all of its processes is no longer an arcane hobby. It is very much a life-skill, and one that none of us can afford to be without.

But this point applies with even more force in the case of members of the legal fraternity. Establishing a “fact” beyond reasonable doubt is a phrase that is used liberally in the context of legal proceedings, but what it means in practice is something that is not always clear to the practitioners of the law. And that, as we have seen, can have truly tragic consequences.

A fact is “any thing, state of things, or relation of things, capable of being perceived by the senses”. What about a “fact being proved”? “A fact is said to be proved when, after considering the matters before it, the Court either believes it to exist, or considers its existence so probable that a prudent man ought, under the circumstances of the particular case, to act upon the supposition that it exists.”^[3]

Are our perceptions infallible? There exists a good amount of literature that shows that this is not always the case.^[4] Second, perception is only one part of the process. Having perceived something, our brains then go on to interpret it, and then register a thing as a fact. As the example of Sir Roy Meadow’s testimony shows, our interpretations are not infallible either.^[5]

Which therefore leads us to ask an uncomfortable, but critically important question: when is a fact a fact and when is it proved? If something is a fact, what can we deduce from it? Is there probability involved, and if yes, how confident are we that members of the legal fraternity understand probability, and by extension, statistics well enough?

Our contention is that while law students are taught facts as words and paragraphs, they are not au fait with the world of numbers, and that this needs to change now. Lawyers need to learn to interrogate numbers given to them by “experts” rather than think of them in absolute terms, just as they have learned to interrogate words and paragraphs of evidence.

What we refer to as data is a collection of facts. Performing statistical analysis on these facts allows us to reach certain conclusions. These conclusions, in turn, allow us to arrive at a decision. But because facts themselves are based at least in part on probability, each of the subsequent steps in the process we have just outlined are also based on probability. We live, in short, in an uncertain world filled with data.

That uncertainty is also impacted by context. Consider, for example, the well-known but still misunderstood phenomenon of the prosecutor’s fallacy.^[6] The importance of a piece of statistical evidence is contingent upon the context and the priors of a case, and misleading and potentially dangerous misinterpretations become all too likely.^[7] Andrew Gelman famously said, “If you do not know the context, statistics are meaningless” but we believe that statistics and data analysis reports can be very misleading if the context is not fully understood.

For all of these reasons, and more besides, we hold that teaching students of law about data and statistics is nothing short of imperative. Specifically, we recommend that law students must be made aware of how data is generated in today’s world, how it is captured and labelled, how it is stored, how it is retrieved and finally, how it is combined with other data sources. The processes involved in each of these needs to be understood, and especially their legal implications.

In addition, every law student must be made aware of the statistical processes that are used to arrive at conclusions. These conclusions are based on some measure of probability, and it is the balance of probability that often becomes the pivotal, deciding factor in the resolution of a case. For this reason, a clear understanding of these processes, their pitfalls and their limitations is also necessary.

Without possessing these skill sets, members of the legal fraternity often risk arriving at decisions that have the potential to change lives forever. Even worse, with the proposed advent of tools such as machine-learning and artificial intelligence, decision-making itself may soon be outsourced to a statistical model of some sort, with all of its attendant potential for mis-classification.

The world may well have progressed far too much on this path for us to imagine a retracement of our steps. But a deeper familiarity with our current environs along this path, and an understanding of what lies beyond would go a long way towards helping make sure that there the Sally Clarks of the future do not suffer the same, entirely avoidable fate that befell the original.

That is our intention behind proposing that a study of the field of statistics and data analysis be made mandatory in all law schools in this country. We hope to have convinced you beyond any shadow of doubt.

†Murali Neelakantan is the principal lawyer at amicus. He was formerly global general counsel at Cipla and global general counsel and executive director at Glenmark.

†† Ashish Kulkarni teaches courses in economics and statistics at the Gokhale Institute of Politics and Economics, Pune, and blogs at econforeverybody.com

[1]Ray Hill, Reflections on the Cot Death Cases, Medicine, Science and the Law 47.1 (2007):2-6. See also the various cases in the United States where convictions based on DNA evidence were overturned. For a review of some issues surrounding the use of DNA evidence see, Murali Neelakantan, DNA Testing as Evidence – A Judge’s Nightmare, Journal of Law and Medicine (1996).

[2]John Cheney-Lippold, We Are Data — Algorithms and the Making of Our Digital Selves (2020) is an excellent book on this topic and which should be essential reading for everyone.

[3] Evidence Act of 1872, S. 3.

[4]Shepard, Roger N., Mind Sights: Original Visual Illusions, Ambiguities, and Other Anomalies, with a Commentary on the Play of Mind in Perception and Art, W.H. Freeman/Times Books/Henry Holt & Co., 1990.

[5]See also, Taleb, Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets (2001); Kahneman, Thinking, Fast and Slow (2011); and Kahneman, Noise: A Flaw in Human Judgment (2021).

[6]De Macedo, Carmen, Guilt by Statistical Association: Revisiting the Prosecutor’s Fallacy and the Interrogator’s Fallacy, The Journal of Philosophy 105.6 (2008):320-332.

[7]Koehler, J. (2000), The Psychology of Numbers in the Courtroom: How to Make DNA-Match Statistics Seem Impressive or Insufficient, S. Cal. L. Rev., 74, 1275.(not clear please check)