I recently wrote a piece on the impact hypothesis–that guess we make about how we turn the output of a data science project into impact. The tldr (but, you know, actually do read it) is that successful data science projects often fail to produce impact because we assume that they will. And that needs to change.

The Idea, in brief

Too often it’s taken for granted that if the algorithm is performing well then the business will improve. However, there’s almost always a step we need in order to take data science output (e.g. predicting churned customers) and create impact (e.g. fewer churned customers).

The step in between is the how we transform data science output into impact (e.g. targeting those predicted would-churn customers with promotions). It’s often assumed the how will happen. But if your assumption proves false, even amazing data science can not be impactful.

Instead of assuming the how will happen, we should explicitly state it as a hypothesis, critically scope it, and communicate it to all stakeholders.

This task should generally fall to the project manager or a nontechnical stakeholder–but its results should be shared with nontechnical and technical team members alike (that includes the data science team!).

If you incorporate this into your project scoping process, I guarantee you’ll have much more impactful projects and a much happier data science team.

So how come you’ve never hear of it?

At this point you may be wondering, if the impact hypothesis is so important why is it so often overlooked? And also, why have you never heard of it?

We conflate resource allocation with importance.  The data science part of the project is the part that demands the most investment in terms of time, personell-power, and opportunity costs.  Because we allocate it those resources, we also tend to give it all the attention during planning as well.

In reality, planning and scoping should be dedicated according to how critical something is to the success of a project. (Of course, don’t take longer than is need to accomplish the task. Dur.)  We know that if the impact hypothesis fails, the project can not succeed. Therefore it easily merits a dedicated critique.

The impact hypothesis is implied. This is a problem for many reasons. The biggest is that implicitness removes ownership.  So often the impact hypothesis goes unstated, leaving it both nebulous and anonymous.

The hypothesis needs to be made explicit so that it can be challenged, defended, and adjusted.  Without a clear statement, the how goes without critique because there is nothing to be critiqued.

If someone does challenge an unstated hypothesis, it is likely to become a shapeshifter of sorts; one that morphs slightly whenever a hole is identified, as hands are waved about it, dismissing and dodging valid concerns.

Without ownership there is no one to come to when a flaw in the hypothesis is uncovered. Just the act of writing down the hypothesis will clarify what it is and where it needs to be vetted.

The hypothesis is assumed to be correct.  This is in part a result of attribution issue. Having no notion of ownership often implies its ‘common knowledge.’ That is, a lack of ownership implies no one owns the idea because it’s so obvious correct. We should accept ‘obvious’ for an impact hypothesis. It’s too critical to the project.

Sometimes the hypothesis is assumed to be correct because it appears to be straightforward or trivial. Unfortunately, the appearance of simplicity often belies an idea that is no simpler to execute than those that are clumsy to describe.

Even if the hypothesis really is simple, it still needs to be challenged.  This hypothesis is the only thing that connects the data science project to the business impact.  It’s the only thing that justifies devoting the resources and personnel to a major project. Certainly it merits its own scoping.

It wasn’t named. The step didn’t have a name. That shortcoming is in part because we’re all still figuring out this new(ish) data science thing. This idea isn’t novel–nontechnical scoping is already a part of effective cross-team collaboration. Naming it is just a way to acknowledge that nontechnical scoping it is also critical in data science projects.

Now go forth and hypothesize!