Noise, A Flaw in Human Judgment by Daniel Kahneman, Olivier Sibony and Cass R. Sunstein

Recommendation

Professor Daniel Kahneman brings his expertise in decision-making to bear on the noise phenomenon. When you use your judgment to make evaluations or predictions, you are liable to make errors, without knowing how or why. Examining medicine, the judicial system and insurance, Kahneman and co-authors Olivier Sibony and Cass R. Sunstein expose egregious, undetected errors that a “noise audit” could have avoided. By managing noise, they assert, you can solve problems instead of creating new ones.

Take-Aways

  • Judgment seeks to find “true value,” which is not the same for everyone.
  • Noise and bias contribute to errors in judgment.
  • Mechanical judgment eliminates complexity and randomness, and is thus more reliable than clinical judgment.
  • System noise, level noise and pattern noise contribute to error in different proportions.
  • Improve your judgments by using “decision observers” to reduce bias.
  • Deploy “decision hygiene” methods to prevent noise before it happens.
  • In ranking systems, noise occurs when judgments are absolute, not relative.
  • Eliminating noise entirely may not always be worth the trouble.

Noise Book Cover

Noise Book Summary

Judgment seeks to find “true value,” which is not the same for everyone.

The human mind is a “measuring instrument,” and judgments are the measurements. A judgment is a conclusion, not an argument. Computation resides on one side of the spectrum, and taste and opinion on the other. Between them lies the realm of judgment.

Making a good judgment is not the same as having good judgment overall. Judgment does not factor into decisions regarding taste, which is variable and desirable. Judgment aims at true value, which varies from person to person. The unwanted variability of human judgment leads to human fallibility.

“A general property of noise is that you can recognize and measure it while knowing nothing about the target or bias.”

Judgments fall into two categories in which inconsistency is problematic, but for different reasons:

  1. Predictive judgment – Forecasters judge outcomes on the basis of probabilities. When two doctors or two weather forecasters come to vastly different conclusions using the same data, that indicates noise.
  2. Evaluative judgment – These judgments rely on values and preferences, and noise occurs when decisions appear arbitrary, instead of conforming to agreed-on criteria.

Measuring the accuracy of predictive judgments after the fact is almost impossible, especially if they are conditional or long-term. Disparities in evaluative judgments, particularly in systems, lead to unfairness. Inconsistency tarnishes trust and credibility.

Noise and bias contribute to errors in judgment.

To understand the difference between bias and noise, imagine a target and shooters. Biased shooters, for example, consistently miss the bull’s-eye in a recognizable pattern. Noisy shooters, on the other hand, produce random scatter, which proves more difficult to measure because you cannot discern if they’re aiming at the target. Bias indicates consistent deviation from predicted outcomes, such as a scale that consistently adds five pounds to your weight. Noise indicates deviation from an average, such as a manager who consistently underestimates or overestimates how long a project will take.

“What people usually claim to strive for in verifiable judgments is a prediction that matches the outcome.”

Noise occurs when conflicting information requires interpretation – because two people may not see a problem in the same light, even if they possess the same knowledge. All they can do is weigh possibilities and assign probability, because one clear, correct answer doesn’t exist. For example, a candidate for a job may have a difficult character while being ambitious, smart and capable. How do you predict that candidate’s success as a CEO? In one study, the accuracy of predictions ranged from 10% to 95%.

Mechanical judgment eliminates complexity and randomness, and is thus more reliable than clinical judgment.

Many judgments are predictive and, therefore, verifiable. They teach a lot about noise. Comparing professionals, machines and simple rules, professionals commit the most errors. To measure this error, a noise audit uses the “percent concordant,” which makes it possible to compare clinical and mechanical judgments to determine which is more accurate.

For example, take two candidates, and measure how accurately you can predict their eligibility for a job. While the mechanical judgment has more constraints, and weights factors equally, its constraints ensure reliability. Too often, human judgment relies on so many intuitive factors that decision-making becomes almost random. You may think your judgment is more nuanced than a machine’s, but your mood, the moment and your internal preferences can’t replicate the accuracy of a mechanical prediction.

“There is so much noise in judgment that a noise-free model of a judge achieves more accurate predictions than the actual judge does.”

In recent times, machine learning – or AI – has come to prominence in making predictions on the basis of vast troves of data. With greater accuracy than any human, AI is capable of predicting random events. Humans have little tolerance for error in machines, though they tolerate it in themselves. People making predictive judgments too often rely on gut instincts, leading to needless errors.

Wherever prediction exists, ignorance does also – and more than you might think. Admitting ignorance is the first step to addressing uncertainty, and an improvement over allowing overconfidence to flourish and noise to accumulate accordingly.

System noise, level noise and pattern noise contribute to error in different proportions.

When people jump to conclusions, they stick to them – either by substituting a simpler question for a difficult one, by “prejudging” and forcing a conclusion to match it, or by forming coherent impressions quickly and declining to change them. These biases contribute to noise. Psychological bias can lead to statistical bias, but everyone has different biases, which create system noise.

“Multiple, conflicting cues create the ambiguity that defines difficult judgment problems.”

When you face difficult, complex or ambiguous decisions, your mind seeks to fulfill two criteria: that your judgment is worthy, and no better alternatives exist. What you believe and think others believe is not always consistent – for example, because of your mood. These “pattern errors” contribute to pattern noise, which is a combination of stable pattern noise and occasion noise.

Three factors contribute to stable pattern noise: weight of ranking components, personal reactions, and individual qualitative differences among judgments. If you add your unique experiences and your personal quirks, your judgments can be even noisier, though they may exhibit internal consistency in line with your personality.

Error appears into three successive categories:

  • Error into bias and system noise.
  • System noise into level noise and pattern noise.
  • Pattern noise into stable pattern noise and occasion noise.

Noise contributes more to error than bias contributes. Among the different kinds of noise, pattern noise is significantly more prevalent than level noise – usually, by double.

Improve your judgments by using “decision observers” to reduce bias.

To improve judgments, conduct a noise audit by having multiple judges assess the same problems. The variability in their judgments is noise. If you have a problem with system noise, consider replacing individuals with simple rules or algorithms. Be aware that AI cannot replace human judgment. Naturally, you want to line up the best judges to improve your error rate, but factors that make someone a good judge are not always clear. Start with people who already have a reputation for good judgment. They will be confident in their judgments and able to explain their reasoning. With many years’ experience, they excel at forming coherent narratives.

“Bias leads to errors and unfairness. Noise does too – and yet, we do a lot less about it.”

Alternatively, seek judges with a cognitive style featuring careful thought. These people interrogate information to ascertain whether it is accurate or trustworthy. They are usually more humble, as well as open to criticism and to changing their minds as facts change. When working on a noise audit, these people can observe the decision-making process and alert the team to unidentified biases.

Deploy “decision hygiene” methods to prevent noise before it happens.

Noise is harder to identify and fix because unlike bias, noise is more unpredictable and harder to explain. To address noise, focus on prevention, not cure. This approach is decision hygiene, and resembles handwashing among health professionals. You will never know which errors you prevented exactly, but you will have statistically reduced their number.

“Just like handwashing and other forms of prevention, decision hygiene is invaluable but thankless.”

Some methods for practicing decision hygiene include:

  1. Sequencing information to limit the formation of premature intuitions – Cognitive bias can affect many professions, such as forensic science. Give people only the information they need when they need it, and require them to document their judgments at every step.
  2. Aggregating multiple independent estimates – Forecasting suffers infamous bias, and statistically, forecasters are terrible at their jobs. The easiest fix is to average several judgments, which dramatically reduces noise.
  3. Developing diagnostic guidelines – Doctors rely on their training to diagnose disorders, and some are better at it than others. Having guidelines simplifies the process of diagnosis and reduces error.

In ranking systems, noise occurs when judgments are absolute, not relative.

Everyone dreads performance reviews, which have grown increasingly complex over the years. While they are endemic, they remain useless for ascertaining an employee’s true worth. Defining scale in performance ratings is a decision hygiene method. Choose a single dimension, and rank employees against one another, rather than using absolute scales. Ranking reduces pattern noise and level noise, producing results that are more consistent – and thus, more accurate.

“You can improve judgments by clarifying the rating scale and training people to use it consistently.”

Noise is a problem when you’re hiring new people. Interviewers bring cognitive biases to the process. Often, they rely on first impressions, and then seek coherence. The solution? Structure complex judgments by aggregating different judges’ assessments. Google, for example, uses these principles in its structure:

  1. Decomposition – Break the decision down into components. That focuses the judges on the relevant information.
  2. Independence – Ask predefined questions about candidates’ behavior in various situations.
  3. Delayed holistic judgment – Do not exclude your intuition about a candidate. Delay it. Form a committee to review all the data interviewers collected to make a collegial decision.

Data drive Google. Thus, the final decision is not mechanical, but informed by averaging combined scores.

Eliminating noise entirely may not always be worth the trouble.

Costs can outweigh benefits when you’re trying to eliminate noise. Paramount among costs is unfairness, since mechanical judgments can’t replace human discernment, particularly when people’s lives are at stake. The financial costs may be too much for public institutions such as educational entities to bear.

Sometimes, noise reduction causes more errors than it fixes. For example, algorithms outperform humans in making noise-free judgments. However, they allow unacceptable biases. Humans value their judgment because it is more discerning and nuanced, and relies on moral underpinnings that no one wants to disregard. Mercy, for example, is a human quality that no one wants an algorithm to eliminate. If the noise-reduction methods are unfair or crude, but the noise causes irredeemable unfairness, the solution is to create better noise-reduction methods, not to ignore the problem.

“It might be costly to remove noise – but the cost is often worth incurring. Noise can be horribly unfair.”

Social values evolve continuously, and flexibility in judgments can allow new values and beliefs to flourish. In workplaces, having mechanical rules that govern your tasks can seem dehumanizing, and squelch creativity. Noise reduction is beneficial in rules-based systems.

Regarding standards – which are more open to interpretation and, therefore, judgment – reducing noise is more desirable. Standards are vague for a reason: They require more nuance. For example, a university may have a standard policy regarding sexual harassment, but not rules for how to behave in every situation. However, when you’re exercising judgment, remain aware that your goal is accuracy, not self-expression.

About the Authors

Daniel Kahneman

Princeton University emeritus professor Daniel Kahneman won the 2002 Nobel Prize in Economic Sciences and wrote the bestseller Thinking, Fast and Slow. Former McKinsey senior partner Olivier Sibony is a professor of strategy at HEC Paris and Saïd Business School, Oxford, and wrote You’re About to Make a Terrible Mistake!  Senior counselor in the Department of Homeland Security for the Biden administration and Harvard professor Cass R. Sunstein wrote the bestseller The World According to Star Wars.

Video & Podcast