The Devil’s in the Details: Hidden Dangers of Predictive Analytics
By Chris Matty, CEO of Versium Analytics
Big data continues to pave the way for predictive analytics, the latest not-so-secret weapon in the CMO’s arsenal. According to a report from Forrester Consulting, predictive analytics is on the roadmap for 89 percent of marketers. As predictive analytics evolve further toward machine learning and artificial intelligence, they allow marketers to become modern-day soothsayers, delivering what the customer wants, where they’re looking for it, when they need it, increasing the likelihood of a conversion.
It’s not only marketers who are excited by the prospect of smarter targeting; executives in the C-suite also praise AI’s ability to improve efficiency and productivity. From marketing to sales, predictive analytics have the power to transform the customer experience by creating truly customized touch points like never before. But is it too good to be true?
While the industry praises its capabilities, many marketers are skeptical. They’re often left wondering how they can take advantage of all predictive analytics has to offer, or if they even believe predictive analytics can do what’s advertised. Marketers have every right to be skeptical. There are a number of success stories, but there are at least as many warning stories from companies that spent a lot of money on a solution that didn’t deliver.
In order to avoid becoming the next predictive analytics horror story, marketers must work to understand the data and confront three common challenges: 1) data consolidation, 2) data hygiene and 3) data normalization.
1. Data Ponds Muddy the Data Lake
Take any large retailer, for example: They likely have an online store presence, brick and mortar locations, and maybe even offshoots of other brands or outlet stores. Each business is likely tracking and storing data about its customers–data such as contact details and shopping habits–in individual data ponds, creating silos of information with desparate or conflicting characteristics.
In this situation, customer representation in a system is critical and has the potential to impact or diminish available insights if represented inconsistently. For example, an individual could have made three different transactions: the first as “Robert Smith” shopping in the store, the second as “Bob Smith” shopping online and the third as “Rob S.” shopping at the outlet store.
There are additional challenges marketers can face, including nuances with shipping details and address standardization, that would skew appropriate data intelligence.. In one case the shipping address could be for personal use in another possible a product is being shipped to a friend. Understanding the difference and mapping addresses to customers can make a difference in understanding true buying propensitn. Not to mention if the address is represented differently in two different records most databases will not recongize the two records as being tied to the same individual or household. For example 123 Main Street, St Louise, MO vs 123 Maint St, Saint Louis, Missouri. Address standardization and spelling erros can creat data matching inaccuracy and to lower quality insights. Combing these types of factors that very frequently occur in data with time sensitivies such as seasonality and often the data intelligence may be miss interpreted.
“The data provides a great baseline and starting point, but various things can skew the results,” notes Bethanie Nonami, co-CEO of marketing agency MarleyNonami. “For one, subjectivity is hard to measure. Plus, if you have seasonality with your campaigns, like Black Friday for retail, this one weekend may give your algorithm a false boost that it would predict might happen again next weekend.”
If a marketer puts this data into a predictive analytics model, the model becomes skewed because the data isn’t cohesive, and the system begins predicting the wrong things. Being able to link data or remove onerous data, such as in the examples outlined above, and normalize with time will make the models and predictions more accurate.
Small or large company, data ponds and silos are a pervasive issue. And it seems the bigger the customer, the bigger the problem.
“As product, marketing and sales are growing, they’re becoming more siloed, said Mostafa El-Bermawy, vice president of marketing at Workzone, a provider of web-based project management solutions. “We recognize that siloes are happening and try to fight it with our culture, with visibility and transparency. Each team knows what the other is working on, and we have one dashboard that we all look at. But we’re a smaller team, so it’s less of an issue for us. It’s a bigger struggle for large enterprises.”
2. Data Variety Threatens Integrity
Further complicating matters are human error and inconsistencies that occur when inputting data. These elements create data variety that threatens integrity. From incompatible entries, like someone tracking sales of an item and entering “2K” into a data field versus “2,000,” to leaving fields blank, a data entry problem creates data modeling issues because it skews the trend information. Ultimately, biases in the data breaks predictive models.
“When you have a human relevancy system, you should always factor in mistakes,” said El-Bermawy. “When someone makes a mistake inputting data, it impacts our numbers going forward. Humans do make mistakes, and that’s the scary thing. Right now we’re using tools to help us predict the future, and tools–like Hubspot–to help us ensure that we have healthy data.”
As a result, ensuring data accuracy and consistency are key. Like Workzone, and in most cases, marketers will need to employ a professional services component in their program to help ensure that the data they’re working with is a true representation of what, or who, they’re tracking. In our experience, 80 percent of the modeling effort is making sure the data is properly prepped and pre processed before it gets loaded into a predictive analytics tool. Sophisticated ETL is crucial in the journey toward extracting the full promise of advance predictive analytics.
3. Dirty Data Infects Results
Paramount to the success of a company’s predictive analytics model is improving the quality of data the marketer is working with to derive insights. In our work, we’ve found up to 40 percent of data being used for predictive analytics models is simply bad data.
This is often a more pervasive problem among online marketers. They trudge through a lot of questionable data, also known as “dirty data,” from online registration forms. When customers
fill out a registration form, they often enter shortened or false name information to speed up getting past a form and into the portal, or drop in a fake email address to avoid getting follow-up emails. Although there is not necessary malicious intent behind these behaviors, it creates a data validation problem.
Luckily there are tools to help marketers solve this problem. Some even go so far as to identify brand new email addresses that were created solely for dropping into a registration form and flagging those suspicious entries for the marketer to explore, or remove altogether from the dataset. Excluding a real person or real data is far less damaging to a data model than including a fake person or fake data. Therefore idnetity validation is a key step in the process.
“We try to capture as much data as we can, at every touchpoint with a customer or a prospect,” said El-Bermawi. “When you have the right structure of data or relevancy of data, and you don’t have any data integrity problems, you can compare data month over month or year over year and see a bigger picture. With the right structure and integrity of data, we have the privilege of drawing better comparisons and learning which segments or keywords work better [for our customers].”
Is the data ready?
The best predictive analytics results are only achieved under optimal conditions, requiring research and hard work before engaging a predictive analytics vendor. Consolidating, normalizing and measuring the data’s integrity are all paramount steps to take in order to gain trustworthy insights. Inaccurate and unreliable data only leads to inaccurate and unreliable predictions.
“Most agencies that have not embraced data are not only doing a disservice to their firm, but to their clients,” said Nonami. “You can’t throw spaghetti on the wall to see what sticks and waste time and money over and over again. We have access to more data than any other time in history, so you don’t have to guess any longer.”
As you begin to evaluate vendors and integrate predictive analytics into your marketing roadmaps, consider taking the necessary steps to address these common issues at the start, ensuring you can rely on your predictions from day one.