Applying Agile to Speed Up Your Data Projects

01–

Establish cross-functional teams

white male with short thinning curly blond hair, facial hair, smiling

Strong cross-functional teams enable a higher velocity on initiatives and ownership of outcomes, which are both key to an effective Agile strategy.

—

Mason McClelland

Director of Strategy

RevUnit

Purple icon with pencil and checklist items

How establishing cross-functional teams speeds up your data initiative

Allows you to move faster by breaking down silos
Dedicates a team’s time and focus to the initiative
Brings in varying perspectives which will more likely lead to success

Identify who needs to be on your small, cross-functional team

Appointing a dedicated, cross-functional team is the most effective approach organizations can take to improve overall data quality and speed up initiatives.‍

One of the core principles of agile is that business people and developers must work together daily throughout a project to ensure its success. This concept applies to any data projects you are tackling, too. Typically, there are barriers between different functions in the organization, but removing them through an agile approach will help create a clearer understanding and mutual success.

It’s important that you ensure all stakeholders of the project are represented on your team. Be sure to include someone to represent everyone who will be affected by your initiative — business, data, product, design, and tech/IT (depending on how your organization is structured). With this agile structure, you will be able to address one of the most fundamental issues nearly every enterprise faces.

‍“Data is controlled by people who understand data and needed by people who don’t.” — HBR Pulse Survey: Turning Data Into Unmatched Business Value‍

‍There’s also typically a limited amount of expertise within IT and business groups when it comes to newer data migration technologies and capabilities, as well as approaches to data delivery. Your tech team should be the experts in the infrastructure and technology, your business team the experts in the real-world usage/needs, and your data team the experts in where your data lives and what needs to happen to get to it. By creating cross-functional teams that include everyone across the data value chain, you will substantially increase your data initiative’s chances of success.

Empower delegates, not just diplomats‍

female team leader with long curly hair standing, talking to team

Simply keeping stakeholders informed is not enough.‍

We consider these people diplomats. While diplomats may be able to give an opinion or represent a function of the business, they aren’t empowered to actually affect the outcome of the work being done. When bringing people into your cross-functional team, you need delegates, not just diplomats. Delegates are individuals who are dedicated to the work that is being carried out, and they can make adjustments or add new work in response to new developments in the project.

This is why delegates are so vital to your cross-functional team; you will need the political buy-in for your team to make decisions on their own. Waiting for approval or removal of red tape outside of your cross-functional team can substantially slow down progress. So make sure you not only have the right people, but that those people also have the autonomy to make calls on the work being done.

We suggest (and McKinsey agrees), to successfully deploy an agile data approach, your cross-functional team members should physically sit with members of the IT organization. Since many organizations have been working remotely and will likely continue to do so, this interaction needs to be replicated as best as possible in the remote setting — through slack groups, stand-ups, or syncs — so you can continue to collaborate and truly work as one team.

Spotlight —

02–

Work in iterative cycles

white man with short and straight strawberry blonde hair with a large smile wearing a blue shirt

Working in Iterative cycles is the single best thing you can do to delight customers and de-risk your product.

—

Doug Mitchell

Vice President of Delivery

RevUnit

How working in iterative cycles speeds up your data initiative

Allows you to prove success or pivot quickly
Prevents your team from going too far down a path without validation
Defines checkpoints and a faster working cadence for your team

First define your MVB

In agile, your priority is always to satisfy data users and stakeholders through early and continuous delivery of valuable solutions.

‍But there is a lot of investigation, exploration, testing, and tuning required to make real sense of your data, and that’s where your MVP (minimum viable product, or MVD, minimum viable data, for our purposes) comes in. When you determine your MVP, it needs to be more than just that — your solution needs to be viable, but also achievable, and desirable. To kick off this process, you must ask yourself, “What does the first success look like? What about the second? And what about the successes thereafter?” You will certainly expand out from your initial MVP, but it defines a starting point that allows you to show success (or fail forward) quickly.

Start with a pilot project to prove your concept and gather buy-in before you begin. Keep things simple and contained by beginning with one set of source data, building only one model, and choosing one report to tackle first.

Be mindful that data science projects also follow what’s known as the “law of diminishing improvement”. This means, if, for example, a particular model has achieved 70% accuracy, the next 5-10% of improvement takes considerably more effort to attain, and it also depends on limitations in the data set. You need to decide in your team whether the efforts necessary are worth the incremental improvement.

Take small steps to fail fast, working in sprints

team member, with short hair and glasses standing holding a laptop, presenting to team

Showing measurable progress is at the center of agile, earlier we referred to this as failing forward. Working in this cadence, you should deliver frequently, typically every few weeks, with a preference for the short timescale. Working in these small, iterative cycles de-risks your project by working one step at a time.

Our team has found that working in sprint cycles, a time we define as at least one to no more than four weeks in which we work to accomplish a set of predefined tasks and goals for the project, helps to define what steps we need to take over a period of time and continuously validate our work.

After all, data requires a considerable amount of exploring and testing, which is one of the fundamental reasons why agile works so well. The basic steps of an iterative work cycle are to explore, build a hypothesis, then test it as quickly as possible — so every iteration should be testing new hypotheses.

Work from a prioritized backlog of work‍

To continue to work efficiently through agile, you and your team need to derive a list of tasks or a backlog (often features or user needs to address) to tackle.

The work should be broken down into manageable parts that can be accomplished within a single sprint and prioritized based on the importance of the task in relation to your MVD. That’s to say, backlog tasks should be identified and prioritized based on what will bring the most value with minimal effort — not every approach is worth trying, so cover the most promising ones first.

For example, your backlog could start from getting the data in a structured way before it can be analyzed. That structure could be a list for feature selection or feature engineering, or a list of models to select, tune and optimize.

Your data team should also capture and prioritize data debt. Typically, data sources lack owners or data governance implementations. While moving quickly to prove your data solution, it can be easy to finish with a heap of dirty data because you didn’t put in the proper controls at the start. Your agile team should prioritize and address this data debt iteratively, and ask what underlying parts of it must be added on your backlog and prioritized. You do not want to roll out a solution to production and have it continue to create more data cleanliness problems down the road.

Spotlight —

How one of the world’s largest food processors used iterative cycles to develop a >90% accurate machine learning model in just 4 weeks.

A large food processor was losing millions of dollars annually on lost inventory due to an existing inefficient system that was overly manual and error-prone. They needed help redesigning the process to more accurately identify and sort products on the production line.

Our cross-functional team determined a machine learning model would help improve product identification accuracy. We started small, with manually-validated training sets that were tested quickly to determine an appropriate machine learning model. When we got the model to a high accuracy, we slowly scaled the model to reach the real-world levels of data it needed to process.

By employing MVD and iterative cycles, we had a functioning model at more than 90% accuracy after just four weeks.

group of workers pulling and packaging food into crates

03–

Involve and understand the ultimate users of your data

It is prudent to know and understand the objectives and needs of those using your data, for it to be of any real use. The more focused you are around specific user objectives, the better decision making you enable for both them and their organization.

—

Corey Campbell

Director of Design

RevUnit

How understanding data users speeds up your data initiative

Increase adoption and usage of your data
Gets to what’s actually important to address with your project
Helps you roll out your solution in phases vs. a massive reveal (and potential rejection by your users)

Define your data users

One of the biggest obstacles to enterprise initiatives is that the data they require is controlled by people who understand it, but needed by people who don’t — your data users.

‍You need to implement a user-focused approach in order to understand who will use your data solution so you can prioritize the most important elements of the project. This is where you should rely heavily on your business delegates to identify who will need to use your data solution.

For example, separate job roles might need different views of your dashboard, or maybe you’ll find that multiple teams or functions need to pull from your model or API. You need to be certain that you are addressing these up-front user needs and making adjustments accordingly.

Understand their needs, goals, and tasks through interviews and real-world observations

Two women, one with short curly hair, the other with long straight hair and a man wearing glasses, sitting at a table, having a business meeting

Once you have defined who will use your data solution, you need to deeply understand their needs so you can adequately address them. Failing to do this will produce a solution that’s only partially successful at best.

‍A great framework we often use is the jobs to be done by your data users. This framework is an effective tool, especially when it comes to data solutions because it centers around the tasks and goals your users have when using the data in their work. Yes, you are wading into the territory of UX (user experience) design. Its role in data work is often overlooked, but it’s the lynchpin to your data project’s success — whether you call it design or not.

Understanding user needs is essential, and is usually leveraged in agile through the process of user-centered design (UCD). It relies on your understanding of users to create or improve your products and services, and it prioritizes user involvement at every step to ensure that the design solutions align with their needs.

It’s important to note that you need to balance your users’ real-world needs and understand their tasks in relation to your business objectives. Your ultimate goal is to empower your users to achieve business goals — more on that here.

Build and iterate to their needs

two young women working together with sticky notes on a board

Once you’ve developed a deeper understanding of your various user groups and their real-world goals and tasks, you can then start building and iterating to address their needs.

But remember, when working in this cadence, user feedback and involvement shouldn’t stop after the first iterative cycle. It’s as much of an inherent part of the agile process as anything. You need to continue to bring them along for the full process, putting your proposed solutions in front of them for continued feedback and testing along the way.

Like with everything else in the short, iterative cycles of the agile process, you will want to deploy as quickly as you can, as often as you can. Like we mentioned above, to get your solution into the wild of your enterprise environment. From there, keep tabs on how well your solution is working and what possible improvements you can make (then adding to your backlog).

Spotlight —

The enemy of agile: data silos‍

In most large organizations (and heck, even small ones) data silos are a real challenge. They are the single most common downfall of agile data. Silos can be political, or a natural evolution from pockets of the business needing certain data and finding their own ways to get it.

However they come about, you’ll want to tackle these head-on as you start to spin up your agile data initiative. Creating the cross-functional team will be critical to overcoming these silos. Including necessary delegates will naturally start to break those silos down, and it’s worth the extra effort, even when that means spending the time to understand what each group needs from your data. You won’t be able to take all the silos down at once, but start with just one initiative that spans multiple silos and you’ll start to prove the value of shared data.