Predictive Modeling in a World of Uncertainty

Predictive Modeling in a World of Uncertainty

We’ve already seen how COVID-19 has changed consumer behaviors and business outcomes in the short term, but what about the long term? The constant fluctuation of case rates, openings and shutdowns makes that hard to determine, but there is one change that’s likely to have both short term and long term impacts: the invalidation of pre-pandemic consumer data.

The invalidation of this data also invalidates any models that rely on it to predict future consumer behaviors, such as predictive analytics models—which could signal an issue for marketers and data scientists that have previously relied on these models to help fulfill a company’s goals.

What is predictive modeling?

Predictive modeling is a process that involves gathering data and statistics to predict outcomes based on past behaviors, patterns and trends. It is used to forecast anything from television ratings to corporate earnings. In marketing, predictive modeling is often used to anticipate customer behavior to prevent churn and attract customers with the highest lifetime value.

How do predictive analytics models work?

To better understand how predictive analytics models work, let’s take a look at how these models are built—plus, what their limitations are.

How to build predictive models

Although the process of building predictive models is complex, it can generally be broken down into five essential steps:

  1. Define the model’s goals based on a business’s goals. The model will aim to predict if and how an organization’s goals can be achieved.

  2. Determine what data is needed and how to get it. A company will need to look at what data is currently available and what gaps in existing data need to be filled for an accurate model to be built.

  3. Gather all necessary data. Common sources of data for these models include transaction data, CRM data, customer service data, survey or polling data, digital marketing and advertising data, economic data, machine-generated data (such as data from sensors), geographical data and web traffic data.

  4. Analyze the data. There are a number of predictive analytics techniques and models, so how the data is analyzed will depend on what model is used. It should be analyzed in a way that tracks back to the organization’s goals outlined in the initial step.

  5. Build the model. By establishing a hypothesis, create the test model. The objective is to include different variables and then test to see if the results prove the hypothesis.

  6. Revise models as needed. Predictive models should be revised as additional data becomes available.

Today’s brands need to be more flexible, agile and execute experiments faster. The only way to understand if your offers are effective in this ever-changing environment is to continuously track and measure their success.

Limitations of predictive analytics models

Predictive analytics models require large swaths of data covering a range of activities—which may be hard to come by. Even if a company has enough data to create these models, they may be too simplistic to be accurate. These models may not account for variables—anything from changes in the weather to changes in the overall economy—that could change a consumer’s behavior and break previous patterns.

Because customer behaviors are ever-evolving, a model that is accurate at one point in time may no longer be accurate in a few weeks or months. Take, for example, the coronavirus pandemic, which created shifts to consumer behaviors that could never have been previously anticipated. These big shifts have likely made predictive analytics models created pre-pandemic largely irrelevant.

Predictive Analytics Model Types

Although predictive analytics models certainly have their limits, it’s worth understanding the types of models that exist, as well as the pros and cons of each. Here are some of the main examples of predictive model types.

Classification Model

The classification model separates data into categories based on what it learns from historical data. This predictive modeling type is typically used for broad analysis and to answer basic “yes” or “no” questions, such as “Is this retail customer likely to churn?” or “Is this banking customer making a fraudulent transaction?”

Pros of the classification model:

  • Can be applied to many different industries
  • Can be easily retrained with new data

Cons of the classification model:

  • Unable to make complex predictions

Clustering Model

The clustering model sorts data into groups based on similar attributes. For example, the model could group loan applicants based on their demographic attributes, financial status or geographic areas. This would allow a lender to devise strategies to target each group.

Pros of the clustering model:

  • A faster way to target consumers than individual marketing campaigns
  • Allows marketers to develop targeted strategies at scale
  • Can identify consumer patterns based on shared traits
  • Analyze without prior “labels”

Cons of the clustering model:

  • Assumes that all individuals in a given cluster will behave similarly, which is not always the case
predictive modeling

Outlier Model

The outlier model identifies anomalous data entries within a dataset. It can hone in on anomalous figures by themselves or in conjunction with other numbers and categories.

For example, data from a customer support center could be scanned for spikes, which may indicate a product failure and need for a recall. On an individual level, a spike in banking transactions could be an indicator of fraud. The outlier model can analyze not only the amount of the transaction, but also the location, time and nature of the purchase to determine if it’s likely to be fraudulent. So if an individual made an out-of-the-norm $1,000 purchase, but it was on a big-ticket item such as a new TV, that may not be flagged, but if the $1,000 purchase was made on clothing when the customer typically doesn’t spend that much in that category, it might be flagged as fraud.

Pros of the outlier model:

  • Useful for making predictions in the retail and finance industries
  • Can analyze a number of categories at once to determine true outliers

Cons of the outlier model:

  • Doesn’t account for external factors that may cause outliers, like a global pandemic

Time Series Model

With a time series model, time is the input parameter for the data points captured. The model uses data from the last year in order to predict future behaviors. It is more accurate than simply taking averages because it takes into account seasonal changes as well as yearly events that could affect behaviors. For example, if a salon owner wants to know how many people will visit his salon in the next 90 days, looking at the same time period from the previous year may paint a clearer picture than simply looking at the previous 90 days, as consumer behaviors are usually not linear.

Pros of the time series model:

  • Can be utilized by a wide range of industries, including retailers, hospitals and customer service providers
  • Can better model exponential growth than simply taking averages over time
  • Can forecast for multiple projects or multiple regions at the same time

Cons of the time series model:

  • Assumes that previous years will be an accurate predictor of future years, which is not necessarily the case

Forecast Model

The forecast model estimates the numeric value for new data based on learnings from historical data. Using historical numerical data, these models can predict customer behaviors over various time periods, from an hour to an entire sales quarter.

The forecast model is able to ingest multiple input parameters to account for circumstances that could change an outcome. For example, if a theme park wants to predict the number of attendees it is likely to receive in the next month, the model can take into account holiday weekends that may increase attendance, weather patterns that could increase or decrease visitors, and seasonal illnesses that may reduce attendees.

Pros of the forecast model:

  • Models can be created any time historical data is available, so it’s applicable to a wide variety of industries
  • Able to take different variables into account

Cons of the forecast model:

  • Makes the assumption that past behaviors will be indicative of future behaviors

What is the biggest assumption in predictive modeling?

As you can see from the “cons” of all of the various model types, predictive modeling is all based on the assumption that historical data will provide an accurate picture of what the future will look like, which, as we’ve seen over the past year, certainly is not always the case. These models may be effective for predicting outcomes in a specific instance that mirrors a previous situation, but their findings can’t always be generalized to effectively predict outcomes when the situation changes.

Why historical data is invalidating predictive models

Consumer behaviors have shifted in a number of ways since March of 2020:

  • Demand has moved from brick-and-mortar to online in nearly every industry
  • Safety has become a key driver in some sectors that were previously value-driven
  • Other sectors are seeing high price consciousness due to economic uncertainty
  • Brands and products have left shoppers’ consideration sets, and new ones have entered
  • Consumer affinities and sentiments have changed as a result of all of the above

Added to these factors, continued lockdowns, reopenings and vaccination efforts will cause behaviors to keep changing throughout 2021 at a minimum. For businesses, this means the data their segmentation relies on is probably outdated—so even if the model itself still aligns with their business objectives, the predictions it generates are likely to be inaccurate.

For example, if a company’s segments were defined using clustering algorithms, combining data from 2019 and 2020 may result in very different clusters and segments than if the data is limited to the first half of the pandemic.

Inaccurate segmentation makes it hard for marketers to deliver relevant offers to customers—at a time when customers expect greater relevance than ever before. This can negatively affect loyalty marketing programs and customer retention, both of which are critical to business recovery in the pandemic.

How to counteract the data deficit

To improve the quality of the data being gathered in the near term, the Harvard Business Review recommends going beyond transaction data and leveraging “COVID-19-aware” data sources to understand what’s motivating customers now.

For example, many airline customers may not yet be ready to book tickets, or in the case of California’s recent lockdown, non-essential travelers may be legally prevented from booking hotel rooms. But they may still be interacting with emails or customer apps, using websites, calling customer service or engaging with social media channels. All of these interactions can help airline and hotel chain data scientists begin to understand new traveler preferences, behavior patterns and travel readiness.

Other possible steps to take include limiting data to a specific short-term period and running consumer pulse surveys—but with the status of the pandemic continually in flux, these may become outdated as soon as the next lockdown, reopening or regional situation changes.

To recalibrate segmentation models and help marketing reestablish relevance to customers, data scientists will need to make their data modeling process more flexible and adaptive. That means they’ll need to collaborate with their marketing teams to create a process of rapid experimentation, learning and optimization.

3 steps to more agile data models

Even before the pandemic, experts were calling for data modeling to increase agility to better meet business needs. In January, just-in-time data modeling topped Dataversity’s list of data modeling trends for 2020. A key component of this trend is agile data modeling, which focuses on using “a minimally sufficient design” and “the right data model for specific situations.”

To achieve greater agility in the coming year and beyond, there are three steps your data science team should look at taking right away:

1. Integrate data collection and marketing technologies

Whether you’re collecting data from individual sources or using a customer data platform (CDP), you need to make that data actionable. The fastest way to do that is by integrating your data platforms with your marketing technology stack so that marketing can execute on customer data directly.

Of course, this is easier said than done, since marketing usually has a sizable MarTech stack with many different tools, like CRM platforms, email service providers, marketing automation platforms and offer fulfillment platforms. Consider a third party solution that can create a seamless connection between all these tools and your data source(s).

2. Act on your data quickly by running micro-experiments

Once your data sources and marketing tools are linked, help marketing run small experiments to learn how your customer preferences and motivations have changed. For example, rather than sending a marketing offer to a whole segment, target just one attribute that only applies to a small number of customers, such as customers who took a specific action within a recent time frame.

The data from these micro-experiments should be funneled back to your modeling algorithms so it can quickly inform your segmentation models. Make sure marketing has the right tools in place to rapidly scale up any experiments that are working, and continue to run new ones to catch changing behavioral data early.

3. Optimize based on your experiment results

As the results of your experimentation feed back into your data platforms, you’ll be able to continually update your models and segments, delivering the just-in-time data modeling the post-COVID era requires.

To ensure that marketing can leverage your updated models before customer behavior changes again, consider an offer optimization platform that can turn your predictions into marketing offers automatically.

Data agility will decide who wins the market

When it comes to segmentation models, the lack of data during COVID-19 has leveled the playing field between leading brands and their challengers. The companies that are best able to recalibrate them to be agile and adaptive will win the lion’s share of customers, no matter how consumer behavior changes.

To learn more about working with post-COVID customer data, read our recent blog post on customer data platforms.