What every leader should be thinking about right now in order to limit biases and communicate critical data stories more effectively
Biases exist naturally; it’s simply part of the human condition. Still, biases significantly influence how each of us analyze, interpret, and communicate data.
What’s more, those who aren’t classically-trained data practitioners (those who never set out to build a career in research, analytics, or data science, but now find themselves in a data-focused role) can often feel as if they’re “flying blind,” unaware of what they simply don’t know or can’t control.
This resource is intended to get to the crux of how to improve the quality of your data and mitigate biases so that you can learn to trust your data and make more well-informed decisions; it’s a “quick-start guide” because it outlines the most important activities every enterprise leader should be prioritizing right now in order to improve data efficacy and make better decisions.
It can definitely be challenging to break down our own systems of thinking, but luckily, our brains are already wired to want to make sense of our underlying assumptions, biases, and habits. At the end of the day, it's not about simply ‘following your gut;’ it's about becoming more aware of where that feeling is actually coming from (and why).
— Allison Hu, Director of Design, RevUnit
Obviously, if left unchecked, biases can result in less objective decision making, or worse, potentially costly financial mistakes.
So, if you’re in a position where you’re tasked with collecting, interpreting, or communicating data (nearly everyone these days), start by better understanding the types of cognitive biases that are most likely to affect your judgement.
It’s worth pointing out that cognitive bias in and of itself is a topic deserving of a much lengthier and complex discussion than what we can possibly cover here. So, we’ll only aim to briefly introduce a couple of the most common biases we’ve encountered in our own data work, particularly when working with large, complex organizations.
Induction bias, selection bias, and survivorship bias are also common cognitive biases that exist when working with data in any capacity. It’s a bit much to cover each of those in detail here in this guide, so, if you’re looking to go a bit deeper, we’d recommend this closer look at cognitive bias in data science.
There are a variety of bias mitigation techniques, yet one in particular is perhaps most intriguing given the prevalence and promise of more sentient machine learning abilities. These techniques are often labeled “machine-initiative” because they rely on visual analytic tools to play a central role in bias mitigation, where the machine operates as an unbiased collaborator that can act on behalf of the interpreter, or take initiative, to mitigate potentially biased analysis processes.
Individuals who are tasked with interpreting and analyzing data often assume that the data they’re working with is clean and accurate. In many cases, that’s correct. In many others, it’s both an incorrect and consequential assumption. It’s vital, then, that each individual make a concerted effort to thoroughly inspect and question critical source data.
— Colin Shaw, Director, ML/AI, RevUnit
More than 70 percent of organizational leaders have said that suboptimal data quality has negatively impacted their business decisions.
Now, there are a number of contributing factors at play here, yet it’s important to recognize that one of those factors is cognitive bias. In this case, individuals who are tasked with interpreting and analyzing data often assume that the data they’re working with is clean and accurate. In many cases, that’s correct. In many others, it’s an incorrect and consequential assumption, or a loosely held bias about the data itself.
The unfortunate reality of deep-rooted inconsistencies or glaring biases in your source data is the resulting fragility of your decision-making process. Most, especially those downstream from the source data, seldom think to question the data they’re working with until there’s a clear and compelling reason to do so. The alternative path is much more effective; that is, to entertain your own curiosities, ask questions of both your team and your data, and entertain other possibilities outside of the ones you seek. Thus, to reduce unnecessary risk, it’s vital that each individual make a concerted effort to thoroughly inspect and question critical source data.
Audit, catalog, and organize existing data to ensure reliability
This step is necessary both at an organizational and functional level.
Yet, in many cases, this step is either lacking entirely, or it’s been done rather haphazardly. Thus, it’s worth spending the time and energy necessary to map existing data flows within your specific function or team.
When doing so, identify any “dead ends” or silos you encounter along the way. Even if there is a governance team who’s working on a similar initiative, it’s still worth a second look. You want to intimately understand where critical data is housed, how it’s collected, and where it moves over time (both upstream and downstream). Even if you find that everything checks out, at worst, you’re verifying the cleanliness and accuracy of critical data sets, which is hardly a bad thing.
There are a variety of tools that can assist you in this process, many of which will allow you to simplify and automate some of the critical tasks involved in discovering, profiling, and indexing data. In fact, if your organization already has a dedicated data governance team (or a group or business function acting in a similar capacity), it’s likely that an enterprise data intelligence platform is already in place. If that’s indeed the case, check with that team to better understand how the tool and technology stack can be used to assist in this process. Any sound data governance practice will be actively using these kinds of tools at a global level.
Still, despite the clear benefits of auditing, cataloging, and organizing existing data, only 20% of organizations say they publish data provenance and data lineage information internally, and most of those who don’t publish such information say they have no plans to start. Don’t make this mistake. Making this type of information readily accessible to all teams is one of the most important steps toward making data and its sources more transparent a more data-driven culture, which, if maintained and governed properly, are critical to improving data quality directly at the source.
Many teams tend to limit data interpretation and analysis to a small group of individuals—usually those in an analyst role or similar—who may lack the specific domain expertise needed in order to understand the full picture. Instead, seek out other individuals who have the domain expertise that you or your team might lack.
— CJ Weatherford, Principal Designer, RevUnit
Bias mitigation techniques take many forms; some are more intricate and complex than others.
Still, one of the best ways to reduce bias in your analysis and reporting process is to include those who will bring varied perspectives to the same set of data. While it seems like a no-brainer, many teams instead tend to limit data interpretation and analysis to a small group of individuals—usually those in an analyst role or similar—who may lack the specific domain expertise needed in order to understand the full picture.
Instead, seek out other individuals who have the domain expertise that you or your team might lack. It’s worth noting that you’re not looking to gather perspectives just for the sake of doing so. Rather, you’re looking for those specific individuals who, by the very nature of their work, have both a vested interest in the data and thus should have an accompanying seat at the table. These individuals can often provide help to identify flaws, patterns, or supply context that otherwise may go unnoticed — or worse, misinterpreted.
Finally, for many of the same reasons, it’s worth implementing a review process to ensure that the narrative that you’re trying to communicate is clear, accurate, and actionable.
It’ll be difficult (if not impossible) to remove all biases from your reporting. That’s okay. Your goal isn’t perfection; it’s simply unattainable. Instead, you’re looking to implement the safeguards necessary to ensure that your team and organization are making the best decisions with the best data available. To do so consistently still requires a keen human eye, especially given our increased reliance on machine learning and artificial intelligence (AI) models and tools, particularly those that are used to collect, store, analyze, and visualize data.
In fact, Gartner predicts that, by 2023, 75% of large organizations will hire AI behavior forensic, privacy, and customer trust specialists to reduce reputation risk as a direct result of biased data. In addition, large organizations like Facebook, Google, Bank of America, NASA, and others have moved to appoint forensic specialists who primarily focus on uncovering undesired bias in AI models before they’re deployed. These specialists are validating models during the development phase and continuing to monitor them once they’re released into production, as unexpected bias can be introduced because of the differences between training and real-world data.
Safeguards like that of a peer review process (or similar) are critically important; the more thorough those safeguards, the easier it will be to identify, correct, and reduce incidences of obvious or unexpected bias. Doing so is still one of the most effective methods you have in order to ensure that you’re not relying on inaccurate assumptions or biased preconceptions as the foundations of your data storytelling. Hopefully, more trained eyes means fewer mistakes.
Most organizations would readily recognize that they’re sometimes making ineffective decisions because of inaccurate or incomplete data. What’s more, anyone who’s charged with interpreting data is also subject to bias, which is a fundamental part of the human condition.
It’s important to recognize that biases exist naturally. In fact, it’s nearly impossible to eliminate biases altogether when working with mass volumes of data.
Still, it’s important to remember that data is hardly, if ever, 100% objective; it’s humans who give meaning to these figures, draw inferences from their presentation, and define meaning through our personal interpretations, respectively. It’s up to each individual, then, to better understand the ways in which biases affect our own perceptions and to take an active role in limiting the effects of cognitive bias in our interpretation, understanding, and reporting of critical data.