One of the biggest reasons for data science project failure is poor problem framework, which can be easily mitigated by early intervention.
You must have come across the problem by working on various data science projects generally companies or startups take which later get scrapped or problem statement changes by some upper management or client interventions or due to lack of desired result or incomplete data. I have read a good article from the MIT Sloan management review that highlights some of the key points to tackling similar problems and thought to share my viewpoints on the same [1]. The failure rate of various data science initiatives is really high ā often estimated at approx. 70ā80% [1]. (This is consistent with PMIās project failure estimates. - Greg Morris)
As per my experience various reasons for the same can be attributed to :
- Not involving the Right stakeholders in defining the problem who speaks the language of both the data and Business
- Lack of research work and brainstorming because defining a problem is hard work and requires multiple iterations to get it right
- Moving on with the problem without proper analysis of the data/resource availability
- Sometimes the team is involved directly in analyzing data before agreeing on the problem to be solved
- Confusing the problem with its proposed solution
- Defining the initial milestone of the project is necessary to keep stakeholders on the same page
How to Move Towards Better Problem Definition?
Better Problem definition keeps checks on the expectations of stakeholders and it saves a lot of time by reducing unnecessary iterations and creates a better understanding of the product for the developer, analysts, data scientists, and product managers. Involving someone who speaks the language of both data and business is super useful in this process, they become a bridge between business teams and data science teams so they are the ideal people to take responsibility to enforce certain principles that are applied during the problem definition process. Some of the principles are mentioned below :
- Involve people with both data and business acumen.Ā To ensure that your problem definition has the correct inputs, achievable expectations, and initial milestones defined to keep everyone involved on the same page.
- Leaders should allow plenty of time to rigorously define the problem.Ā You must have experience in your team, problem statement often changes as people work to get them right. Leaders of a data science project should allow plenty of time, and encourage brainstorming, debates, and documentation of problem statements in detail as they progress which ensures all the stakeholders are on the same page.
- Do (Root Cause Analysis) RCA to understand the problem definition better.Ā Frame the problem in terms of data complexity, data availability, and data liability. Although having a proper problem definition is nice but it must be supported in terms of data and infrastructure available in the organization. PleaseĀ do not confuse the problem and its proposed solution, For example, A social media product is getting less engagement compared to another similar service provider and management believes that competitors are using an advanced recommendations engine. It will be easy to define a problem statement like ābuilding a better recommendation engine to increase product engagementā.Ā But that predefines a more sophisticated recommender model is the solution to the problem without considering other options, such as improving the push notification algorithm or building a better UI engine, etc. Confusing the problem with the proposed solution concludes the problem is not well understood and it also limits team and individual creativity which leads to confusion among potential problem-solvers.
Please keep in mind, Leaders should ensure the following objective need to be fulfilled before moving to the solution:
- The problem definition should be clear, and solving it should lead to a good business result
- Make sure the problem definition must consider constraints involving time, initial milestone, budget, Technology, data complexity, data availability, data liability, and relevant people/stakeholders, which should be clearly defined to avoid a problem statement disorder with business objectives
- Make sure all involved stakeholders are on the same page