The Enterprise Leader's Quick-start Guide to
A guide to quickening your pace of data change through agile practices
According to Harvard Business Review, less than 10% of companies have exploited data value across their organization. You cannot simply invest in new data technology or infrastructure and expect results to happen.
It’s necessary to incorporate an agile approach in your data projects in order to move quickly, prove ROI, and change the way in which you leverage data itself — not just the way you build technology.
Whether your data project involves building a data model, a data visualization (report, dashboard, etc.), applying machine learning, or data integration, taking an agile approach will get you more successful results, faster.
How establishing cross-functional teams speeds up your data initiative
Appointing a dedicated, cross-functional team is the most effective approach organizations can take to improve overall data quality and speed up initiatives.
One of the core principles of agile is that business people and developers must work together daily throughout a project to ensure its success. This concept applies to any data projects you are tackling, too. Typically, there are barriers between different functions in the organization, but removing them through an agile approach will help create a clearer understanding and mutual success.
It’s important that you ensure all stakeholders of the project are represented on your team. Be sure to include someone to represent everyone who will be affected by your initiative — business, data, product, design, and tech/IT (depending on how your organization is structured). With this agile structure, you will be able to address one of the most fundamental issues nearly every enterprise faces.
“Data is controlled by people who understand data and needed by people who don’t.” — HBR Pulse Survey: Turning Data Into Unmatched Business Value
There’s also typically a limited amount of expertise within IT and business groups when it comes to newer data migration technologies and capabilities, as well as approaches to data delivery. Your tech team should be the experts in the infrastructure and technology, your business team the experts in the real-world usage/needs, and your data team the experts in where your data lives and what needs to happen to get to it. By creating cross-functional teams that include everyone across the data value chain, you will substantially increase your data initiative’s chances of success.
Simply keeping stakeholders informed is not enough.
We consider these people diplomats. While diplomats may be able to give an opinion or represent a function of the business, they aren’t empowered to actually affect the outcome of the work being done. When bringing people into your cross-functional team, you need delegates, not just diplomats. Delegates are individuals who are dedicated to the work that is being carried out, and they can make adjustments or add new work in response to new developments in the project.
This is why delegates are so vital to your cross-functional team; you will need the political buy-in for your team to make decisions on their own. Waiting for approval or removal of red tape outside of your cross-functional team can substantially slow down progress. So make sure you not only have the right people, but that those people also have the autonomy to make calls on the work being done.
We suggest (and McKinsey agrees), to successfully deploy an agile data approach, your cross-functional team members should physically sit with members of the IT organization. Since many organizations have been working remotely and will likely continue to do so, this interaction needs to be replicated as best as possible in the remote setting — through slack groups, stand-ups, or syncs — so you can continue to collaborate and truly work as one team.
How working in iterative cycles speeds up your data initiative
In agile, your priority is always to satisfy data users and stakeholders through early and continuous delivery of valuable solutions.
But there is a lot of investigation, exploration, testing, and tuning required to make real sense of your data, and that’s where your MVP (minimum viable product, or MVD, minimum viable data, for our purposes) comes in. When you determine your MVP, it needs to be more than just that — your solution needs to be viable, but also achievable, and desirable. To kick off this process, you must ask yourself, “What does the first success look like? What about the second? And what about the successes thereafter?” You will certainly expand out from your initial MVP, but it defines a starting point that allows you to show success (or fail forward) quickly.
Start with a pilot project to prove your concept and gather buy-in before you begin. Keep things simple and contained by beginning with one set of source data, building only one model, and choosing one report to tackle first.
Be mindful that data science projects also follow what’s known as the “law of diminishing improvement”. This means, if, for example, a particular model has achieved 70% accuracy, the next 5-10% of improvement takes considerably more effort to attain, and it also depends on limitations in the data set. You need to decide in your team whether the efforts necessary are worth the incremental improvement.
Showing measurable progress is at the center of agile, earlier we referred to this as failing forward. Working in this cadence, you should deliver frequently, typically every few weeks, with a preference for the short timescale. Working in these small, iterative cycles de-risks your project by working one step at a time.
Our team has found that working in sprint cycles, a time we define as at least one to no more than four weeks in which we work to accomplish a set of predefined tasks and goals for the project, helps to define what steps we need to take over a period of time and continuously validate our work.
After all, data requires a considerable amount of exploring and testing, which is one of the fundamental reasons why agile works so well. The basic steps of an iterative work cycle are to explore, build a hypothesis, then test it as quickly as possible — so every iteration should be testing new hypotheses.
To continue to work efficiently through agile, you and your team need to derive a list of tasks or a backlog (often features or user needs to address) to tackle.
The work should be broken down into manageable parts that can be accomplished within a single sprint and prioritized based on the importance of the task in relation to your MVD. That’s to say, backlog tasks should be identified and prioritized based on what will bring the most value with minimal effort — not every approach is worth trying, so cover the most promising ones first.
For example, your backlog could start from getting the data in a structured way before it can be analyzed. That structure could be a list for feature selection or feature engineering, or a list of models to select, tune and optimize.
Your data team should also capture and prioritize data debt. Typically, data sources lack owners or data governance implementations. While moving quickly to prove your data solution, it can be easy to finish with a heap of dirty data because you didn’t put in the proper controls at the start. Your agile team should prioritize and address this data debt iteratively, and ask what underlying parts of it must be added on your backlog and prioritized. You do not want to roll out a solution to production and have it continue to create more data cleanliness problems down the road.
A large food processor was losing millions of dollars annually on lost inventory due to an existing inefficient system that was overly manual and error-prone. They needed help redesigning the process to more accurately identify and sort products on the production line.
Our cross-functional team determined a machine learning model would help improve product identification accuracy. We started small, with manually-validated training sets that were tested quickly to determine an appropriate machine learning model. When we got the model to a high accuracy, we slowly scaled the model to reach the real-world levels of data it needed to process.
By employing MVD and iterative cycles, we had a functioning model at more than 90% accuracy after just four weeks.
How understanding data users speeds up your data initiative
One of the biggest obstacles to enterprise initiatives is that the data they require is controlled by people who understand it, but needed by people who don’t — your data users.
You need to implement a user-focused approach in order to understand who will use your data solution so you can prioritize the most important elements of the project. This is where you should rely heavily on your business delegates to identify who will need to use your data solution.
For example, separate job roles might need different views of your dashboard, or maybe you’ll find that multiple teams or functions need to pull from your model or API. You need to be certain that you are addressing these up-front user needs and making adjustments accordingly.
Once you have defined who will use your data solution, you need to deeply understand their needs so you can adequately address them. Failing to do this will produce a solution that’s only partially successful at best.
A great framework we often use is the jobs to be done by your data users. This framework is an effective tool, especially when it comes to data solutions because it centers around the tasks and goals your users have when using the data in their work. Yes, you are wading into the territory of UX (user experience) design. Its role in data work is often overlooked, but it’s the lynchpin to your data project’s success — whether you call it design or not.
Understanding user needs is essential, and is usually leveraged in agile through the process of user-centered design (UCD). It relies on your understanding of users to create or improve your products and services, and it prioritizes user involvement at every step to ensure that the design solutions align with their needs.
It’s important to note that you need to balance your users’ real-world needs and understand their tasks in relation to your business objectives. Your ultimate goal is to empower your users to achieve business goals — more on that here.
Once you’ve developed a deeper understanding of your various user groups and their real-world goals and tasks, you can then start building and iterating to address their needs.
But remember, when working in this cadence, user feedback and involvement shouldn’t stop after the first iterative cycle. It’s as much of an inherent part of the agile process as anything. You need to continue to bring them along for the full process, putting your proposed solutions in front of them for continued feedback and testing along the way.
Like with everything else in the short, iterative cycles of the agile process, you will want to deploy as quickly as you can, as often as you can. Like we mentioned above, to get your solution into the wild of your enterprise environment. From there, keep tabs on how well your solution is working and what possible improvements you can make (then adding to your backlog).
In most large organizations (and heck, even small ones) data silos are a real challenge. They are the single most common downfall of agile data. Silos can be political, or a natural evolution from pockets of the business needing certain data and finding their own ways to get it.
However they come about, you’ll want to tackle these head-on as you start to spin up your agile data initiative. Creating the cross-functional team will be critical to overcoming these silos. Including necessary delegates will naturally start to break those silos down, and it’s worth the extra effort, even when that means spending the time to understand what each group needs from your data. You won’t be able to take all the silos down at once, but start with just one initiative that spans multiple silos and you’ll start to prove the value of shared data.
Agile can and should be used to quicken the pace of your data projects, but we also want to stress that you and your team shouldn’t lean so far into agile that you become constrained by its rules. Understand that agile is a set of guiding principles, not necessarily requirements you must rigidly adhere to. Create an agile process that is uniquely suited to your needs and the individuals on your team.
Additionally, don’t overlook buy-in for your new process. It’s a critical box that must be checked in order for your team to have autonomy and make decisions on their own instead of waiting for approval. Without this, there’s the potential for silos and critical parts of your project being held up.
Always be mindful of the fact that the agile process and data innovation don’t have a finish line. It’s a continuous journey that requires direct involvement and understanding of your users to quicken, refine, and improve your data initiatives.