Data Science without a sense of business is like playing baseball without as baseball. For every business, making its products or services better is the ultimate goal of a data science project. Leaving that out of the picture is nonsensical.
Your data team could feature the best coders and the best statisticians, but if they don’t know the actual business application of their data projects, the whole thing will be pointless.
It sounds rather trite, but to achieve business results one must be led by definable, tangible business goals in mind. This is easier said than done. Using a repeatable, highly-refined and holistic process is the only way to hold all disparate parties true to the business outcomes that our projects endeavor to achieve.
The Business Data Science Mindset
Is the goal is to improve the quality of the product or service and not to generate more profit? This doesn’t mean that you won’t make more money because of your data science projects. We rather want to highlight the priorities.
On a high level, you can achieve two things with data science.
#1: Understanding your audience better: Learning about their needs, their struggles, their motivations, their habits and their relationships to your product or service.
#2: Using this understanding: Once you understand your audience you can now create a better product or service, which translates into increased profit.
The order is extremely critical.
Of paramount priority is that this process should help your consumers – members of your company, external customers, partners. Basically, anyone whose efficiency and knowledge can translate into increased performance. As a consequence of that, your product or service will flourish. And that better product or service will bring you more users, more returning users and eventually more revenue.
A Data Science Project Step-by-Step
From data to information – and from information to better decisions.
At its core, (almost) every data project plays the same role in your business. Data science helps you make easier, faster, and better decisions.
As simple as it sounds as complicated it can get in real life.
Let’s take a look at the typical six steps of a data science project:
- Data Collection
- Data Storage
- Data Cleaning
- Data Analysis
- Communication, data visualization
- Data-driven decisions
Data Collection is where many businesses fail already: “Garbage in, garbage out.” is maybe the oldest Data Science adage, and it remains, unfortunately, extremely relevant today.
(1) Data Collection
Too many data projects fail at this very first step. Too many companies collect incomplete, unreliable data and everything they do after that… is just messed up.
This is one of the benefits of the six step process. In most traditional firms, data is collected and disseminated in various forms, mostly Excel. There are a number of points of failure in order to get data into that format and once it’s in there it’s extremely difficult to consistently and constantly validate the information contained therewithin.
True story – we once worked with a financial advisory firm that had been using the same spreadsheet templates for years. It turns out that sometime in the past, probably distant past, that one of the formulas had been modified ever so slightly. For years and years the firm had inconsistent numbers across the board.
It wasn’t until we worked with them to get clean data into an air-gapped data warehouse that this inconsistency popped up. Once in the data warehouse, our established Data Governance Board (the greatest check & balance that exists) realized that during validation that the numbers were clearly out of whack. After a short search, it quickly became apparent from where the bad data was coming from.
The worst thing in this story was not that they had to re-run the reports – but the larger foundational issue was that they could never trust their data again. We always had to double-check and triple-check everything before we made conclusions. The good news is that they are now operating under the governance of a strict, but robust data governance structure that given the entire firm a breath of fresh air. It was a culture-changing event.
The moral of the story is: proper tracking and data collection is crucial for every business doing data science.
What to collect?
We get many questions about what to collect and what not to collect. The answer is that there is no catch-all answer. We have a customize (and trademarked) methodology that offers an inclusive participatory approach from all project stakeholders. This information is assembled, the data is triaged and the consensus is established in about Data Governance Boards.
At rest, we’ll generally collect whatever we can as long as it does not have a barrier of acquisition that is unreasonable. Data that is not needed today may be useful in the future. There are little downsides to doing so as storage and computational costs are effectively nil.
(2) Data Storage and (3) Data Cleaning – automate it and don’t forget to maintain it
(2) Data Storage and (3) Data Cleaning
Automate it and don’t forget to maintain it!
Data storage and data cleaning are the responsibility of data engineers. It’s a highly technical job, probably the most technical aspect of any of our engagements. This is also where we’ll spend most of our time in ongoing Data Governance Boards (DGB) — our clients know how to use the information, it’s our job to get them the right data to turn into information using their acumen, company IP, etc. A good portion of our DGB time is spent validating our findings with business leaders.
The good news? As a data consumer you no longer need to worry about this. No more googling or opening up multiple sources to validate what you are putting into a spreadsheet. What is now something that *breaks* your thinking process is now something that gets visited in a disparate, contained discussion. Freeing you up to make more focused and better decisions is a company culture-changing event!
Data storage follows the same rule of regular storage – if it’s not backed up in atleast two seperate places then it’s not backed up. However, with data, the variables change somewhat. Due to how data is accessed, it’s important to build out a structure whereby data is not only spread out in a fault-resistent manner but in a form that is capable of handling the increased load that reporting will often put onto it. The Tracability Matrix (which is an artifact of your DGB) also demands that each record is auditible. It’s not so much a matter of recovering the actual data, but recovering all of the metadata and history that goes along with it.
Data storage and data cleaning is a project that you should continuously maintain — and a place you should be prepared for “crisis situations,” too. Successful data projects will have our data scientists tending to this garden frequently.
(4) Data Analysis – extracting value
This is where business data science gets exciting – for business people at least.
A data analyst is a sculptor. He or she gets a block of data and then they carve and carve until she gets something truly special.
And it’s a creative process, indeed. It’s a frequent touchpoint in DGBs and the item that requires the most back-and-forth. Representation of information is something that continually shifts and evolves.
I’m a data analyst at heart and I know from experience that when you have an ocean of data in front of you, it can be very intimidating. Often, you don’t know where to start.
But there are a few guidelines that can help. Here are the top three that helped me:
Good questions: To get useful answers, you have to ask the right questions. That usually comes from the management (or other colleagues), who already have suspicions based on their experience. The first part of our engagement always includes a comprehensive interview session with every person who is a critical consumer of data in your organization. In this case, a data analyst’s primary job is to prove or disprove these suspicions (let’s call them hypotheses).
Note: A common misbelief is that disproving a hypothesis is a step backwards. People are looking at it as the failure of an idea… That’s the wrong mindset, though. When a good data analyst proves or disproves an idea, she discovers many new things throughout the process, so she can offer one or more alternative solutions that are better than the original idea.
Let me also emphasize the good in the phrase “good question.” Answering bad questions sets back a data project significantly. Bad questions can be:
- Unimportant questions (“What happens if we change the font on this diagram?”)
- Questions that aren’t business-related.
- Vague questions (“How do people like us?”)
- Questions that we don’t (and won’t) have data to answer.
2. Qualitative research.
Often, when we don’t know where to start with my data analysis, we go to the analysts (or similar) who currently live deeply within Excel.
There is nothing like seeing a real user interacting with your data. One of the greatest errors that we can make as Data Analysts is to assume that everything before this part of your data journey was bad and everything following it is good. By understanding where people are coming from it can not only allow us to quickly get useful charts up and running but it can accelerate adoption as users are operating within an ecosystem that is comfortable to them. As the producers of this information, it can put us in the weird position of trying to explain the positives of a system that is largely unchanged. That criticism would be just in an environment that didn’t approach this endeavor with a constant improvement mindset – the dashboard is ever-evolving!
3. Best practices. Now that I’m a more experienced data analyst I know quite a few data analysis techniques that it’s worth starting my research with.
It really depends on the given data project and on the specific business use case.
But at online businesses I usually start my discovery process with a funnel analysis, a segmentation or a retention analysis project. (More about this in later articles.)
If you start with business analytics today
If you start with data science for your business today, I’d recommend focusing on one specific thing before you do anything else.
That’s finding your single most important metric.
And you should place this metric above every other metric you have — measure it and keep it as your main focus.
A good most important metric is:
- Simple (so everyone at your company understands it immediately)
- Measurable (so it’s an actual number)
- Describing your business goals really well (so it actually matters)
- There is only one!
(Note: Actually there are a few more factors that make a good main metric… but let’s try to meet these four conditions first!)
(5) Data Communication
Data and information translated for business people.
This is the step where most data science projects fail.
Interesting, isn’t it?
You can be the best analyst working with the finest data set in the world… But if you can’t communicate your findings efficiently, you will have zero impact.
That’s the nightmare of every data professional.
There are quite a few roadblocks here. We’ve seen all of them: data-sceptical co-workers, over-complicated presentations, unreadable charts…
The fact is that everyone at your company needs to be involved in order to build a culture where people can communicate and use data.
We have two specific recommendations for you:
Data professionals should hold presentations every week – not just about their recent findings but also about why data science is important for the company. Start with things like what a data analyst does, how the data science business works, how colleagues can build self-serving data solutions for themselves, and so on. Alignment between the consumers of the data and those tasked with producing the data is a core tenet of the DGB.
Business people should educate data scientists, as well. They should help them to create and deliver better presentations.
Keep it simple.
Everything about your communication should be as simple as it can be!
- No fancy scientific words
- No complicated charts
- No infinite emails
If you can show your data-driven takeaways in one line chart and explain them in one sentence, you should do it. Everyone will be happy about it.
(6) Data-Driven Decision Making
Why are data-driven managers important?
An inportant acronym for this step – “HIPPO”?
It stands for highest paid person’s opinion. It was a well-established business decision-making method for a very long time. Thanks to data science, it’s not the case anymore.
Nnot every manager is ready for this to change. Ego, feelings and closed lines of communication can commit a project to failure no matter how well designed and executed it was.
You can prevent this by establishing a data-driven company culture. Data Science goes so much further than clean data and dashboards. It’s the absolute aknowlegement of a fundamental shift in your firm that you’ll use data as a backbone for most decision making.
The 3 Major Data Science Business Applications
Data Analysis (step 3)is a very broad topic. There are so many opportunities to turn your data into value.
More specifically, in business, these are the three most common practical applications of data science:
(A) Business Analytics (aka Descriptive Analytics).
It answers the questions of “what has happened in the past?” and “where are we now?”
(E.g. reporting, measuring retention, finding the right user segments, funnel analysis, etc.)
(B) Predictive Analytics
It answers the question, “what will happen in the future?”
(How much will this investment return in the future given past performance, assumed future variables, etc.)
(C) Data-Based Product
A product that works using your historical data.
(self-learning, AI, etc)
To understand which ones that you should be focusing on is a fairly straightforward query – which ones will offer you, at each stage of your data journey, the highest ROI.
Invest in business analytics and simple reports first. By answering the basics, you will generate tremendous business value: you will see more clearly and you will understand your audience better.