We are awash in a sea of data. Thanks to web analytics tools, CRM systems, and social media, we have more data than ever about the behavior of customers and prospects. What is often lacking are the knowledge and skills necessary to turn this data into useful information.
Both are on display in a brilliant study conducted by Seth Stephens-Davidowitz of Harvard. As reported in the Wall Street Journal, using freely available data from Google Insights, skillful research, and clever thinking, he was able to determine that in the 2008 presidential election, racial attitudes reduced the number of votes garnered by President Obama by 3%-5%. His method of reaching this conclusion, which we’ll review here, represents techniques that can be used by all marketers in gaining insights into topics such as forecasting product demand, buying attitudes, geographical preferences, and buyer demographics.
Stephens-Davidowitz performed the study because of the notorious unreliability of surveys to capture the true racial attitudes of voters. Participants in surveys are highly likely to misreport their true attitudes due to embarrassment. Google-based measures of racial bias are more likely to accurately reflect voters’ attitudes, since they perform Google searches online while likely alone. In addition, information about Google searches is available at finer geographic levels, uses data that is more recent, and aggregates information from larger samples as compared to typical surveys.
The method used in the study was as follows:
- Choose a search term that represents the underlying attitude. In this case, Stephens-Davidowitz used a certain well know racial epithet that began with “n” for the representative search term.
- He had to make sure that the term represented a strong proxy for racial bias; he did this by:
- Examining some of the output from Google Insights, which includes the top related search terms including the word. From the list of related terms, it was clear that the search was motivated by racial bias.
- Verifying that Google search volumes correlate well with demographics one would more often expect to search the term. For example, the percent of a state’s residents who say they believe in God explains 65% of the variation of the search volume for the word “God”. The table below gives further examples:
- Finally, the major potential bias with racial attitude survey data – misreporting due to embarrassment – is unlikely to significantly bias Google data. As mentioned previously, the conditions under which people search -online and likely alone – limit this concern. The following table documents substantial search volume for various terms that researchers suspect may be under-reported in surveys.
- He then used Google Insights to determine the geographic variation in the use of this term in searches. Quite a wide variation was found by media market:
- Stephens-Davidowitz next sought to arrive at an estimate of how this bias translated into votes. In order to do this, he arrived at a first estimate by comparing voting results by media market in the Obama – McCain election with results in the Kerry – Bush election using linear regression.
- To verify that his estimate of racial bias was a strong predictor of the difference in voting patterns between the two elections, he then added additional variables to his analysis that are known to affect voting outcomes.Stephens-Davidowitz concludes that
Estimating the effect of racial animus on voting is complicated by surveyed individuals’ propensity to misreport socially unacceptable attitudes. This paper sidesteps surveys using area-level Google search data and administrative voting records. I find that racial animus played a major role in the 2008 election. Relative to the attitudes of the most tolerant area, racial animus cost Obama 3 to 5 percentage points of national popular vote.
More details are offered in the full study, The Effects of Racial Animus on Voting: Evidence Using Google Search Data. The method described here can be used in a host of marketing applications, including forecasting product demand, buying attitudes, geographical preferences, and buyer demographics.
New research is underscoring the influence of social networks in marketing. Researchers at Telenor, a mobile phone carrier in Scandanavia, developed a map of social connections based on calling patterns between subscribers to analyze the adoption of the iPhone since 2007. The research showed that an individual with just one iPhone-owning friend was three times more likely to own one themselves than someone whose friends had no iPhones. Individuals with two friends who had iPhones were more than five times as likely to have purchased an iPhone.
What is groundbreaking about this research is not the realization that friends and colleagues influence what you buy, but the unprecedented ability in today’s connected world to track, measure, and quantify the effects of social influence. This newfound capability calls for a dramatic overhaul of the way that businesses determine the value of their customers.
The Lifetime Value of a Customer
Traditionally, determining the lifetime value of a customer has long been the starting point for calculating the ROI of a marketing campaign. The lifetime value of a customer is defined as the net present value of the profit a business will realize on the average new customer over a period of years from that customer’s purchases. This number is critical, because it indicates exactly how much it is worth to acquire a given customer. Armed with this information, a business can manage its marketing programs not as an expense, or for short term profits, but as a long-term business investment.
A New Metric – The Network Value of a Customer
As the research on iPhone adoption illustrates, with the rise in the popularity of social networks, it has become increasingly clear that the true value of a customer goes beyond how much he or she might buy from you directly. Traditional measures of customer value ignore the influence a customer may have on how much others buy. For example, if a customer buys your product, and then, based on his recommendation, three of his colleagues buy your product as well, his effective value to you has quadrupled. On the other hand, if a prospect makes his decision based purely on what others tell him about your product, you will be better off spending your marketing dollars on his colleagues.
The implication for marketers means that the lifetime value of a customer can no longer be considered to have captured the true value of a customer. The advance in the understanding of how social influence effects purchase decisions has lead to the creation of a new metric – the network value of a customer. The network value of a customer is the expected increase in sales to others that results from marketing to that customer.
The Factors That Determine The Network Value of a Customer
Which customers have a high network value? There are few businesses that have access to the kind of data that the Telenor researchers had at their disposal – billions of call records. However, by considering the characteristics of customers that have a high network value, there is data that you can collect that will begin to help you identify and target the customers that you have with the highest network value. The customers with high network value share these common characteristics:
- A high level of satisfaction with your product
- Is highly likely to recommend your product to others
- Is highly connected to other potential buyers
- Is highly influential, an opinion leader
How to Target Customers With High Network Value
Even if you don’t have access to billions of records detailing the social connections and behavior of your customers, like the researchers at Telenor, there is data that you can easily collect about your customers that can help you target the customers that you have with the highest network value. They include:
- Collect a Net Promoter Score from each customer – The metric is simple to collect and straightforward to determine, as described on netpromoter.com:
By asking one simple question — How likely is it that you would recommend [Company X] to a friend or colleague? — you can track these groups and get a clear measure of your company’s performance through its customers’ eyes. Customers respond on a 0-to-10 point rating scale and are categorized as follows:
- Promoters (score 9-10) are loyal enthusiasts who will keep buying and refer others, fueling growth.
- Passives (score 7-8) are satisfied but unenthusiastic customers who are vulnerable to competitive offerings.
- Detractors (score 0-6) are unhappy customers who can damage your brand and impede growth through negative word-of-mouth.
With this one metric you can capture the first two characteristics of a customer with high network value – they 1) have a high level of satisfaction with your product, and 2) are likely to recommend it to others.
- Collect social network information about your customers – many companies are starting to ask customers for their Twitter and/or Facebook usernames, in addition to other contact information such as email address. The very fact that a customer is willing to give you this information is an excellent indicator that the customer is actively involved with you product. In addition, it allows you to invite them to follow/friend you on Twitter and Facebook. Also, in the case of Twitter, it allows you to follow them, and collect vital publicly available information about them that indicates how many friends and followers they have, how many tweets they have made, and their bio. This will give you a measure of the third characteristic of high network value customers – how highly they are connected to other buyers.
- Perform a social network analysis of your Twitter and Facebook followers – you can analyze your own Facebook and Twitter followers to determine which customers:
- have the highest number of connections
- are most likely to pass key marketing messages along to their followers
- have the highest influence and are opinion leaders
This information allows you to fill in the final piece of information you need to get a handle on the network value of a customer – the fourth criterion, whether they are highly influential and an opinion leader. Now you’re ready to start testing and scoring groups of customers according to their network value.
Optimize Your Marketing Programs
Clearly, ignoring the network value of a customer may lead to suboptimal marketing decisions. By collecting the information you need to assess the network value of your customers, you can now model both the likelihood that a given customer will buy from you, and the influence that customer has on other’s buying decisions. Then you can select a subset of your customers, and determine not just how much they will buy from you, but the total amount of revenue that they might generate from their influence over others. This enables you to determine the optimal set of customers to market to that will generate the highest ROI.
Social media is now being extended beyond its original applications into a tool for predicting the future. The exponential growth in social media has helped create a large body of content that reflects the trends, experiences, evaluations, and sentiment of the marketplace. It is becoming increasingly apparent that this content can be mined and analyzed to help predict the size of markets, the outcomes of marketing campaigns, and marketing ROI. In this post I’ll take a look at three ways in which data generated by social media has been used recently for Predictive Marketing.
Predicting Movie Box Office Results with Twitter
A group of researchers at HP Labs recently published a paper describing how they used data captured from Twitter posts to predict box-office revenue at the movies. The researchers extracted 2.89 million tweets from 1.2 million users referring to 24 different movies over a period of three months. For each tweet, the timestamp, author, and tweet text were collected and used for analysis. The researchers focused on what they termed the “critical period” – the week before and the two weeks after the release of a movie.
An initial analysis of the tweets revealed that:
- The tweets built up in volume the week before the movie release; peaked at the time of the release; and fell during the two weeks following the release.
- The average number of tweets made by individuals about a particular movie was between 1 and 1.5.
- The distribution of tweets by individuals showed that a handful of individuals made many tweets; the distribution followed the “long tail” power distribution that frequently occurs on the web.
The team then proceeded to analyze the data for predictive power. First, what didn’t prove to be very good predictors of box office success:
- Prior to the release of a movie, studios promote the film heavily via TV, print, news releases, interviews with the stars, and trailer videos. The researches classified tweets according to whether they contained urls, indicating that they could reference trailers, movie reviews, or other PR about the movie. It turned out that although 22% to 40% of the tweets contained urls, such tweets were only mildly predictive of box office success.
- The percentage of retweets was in the 11%-12% range, and was even less predictive of box office success. This is surprising, given that retweets are indicative of word-of-mouth.
There were three factors that proved to be powerful predictors of box office revenue:
- The tweet rate, or the number of tweets about a given movie per hour. This is indicative of the overall attention and interest that the movie is generating. This factor is particularly important in predicting the box office reevnue for the opening weekend.
- Positive sentiment about the movie. The researchers created a customized method for analyzing positive and negative sentiment about movies for the purposes of this study. Sentiment proved to be an important factor in predicting box office revenue in the weeks after the opening.
- Distribution, or the number of theaters in which the film screened. The wider the distribution, the more opportunity that existed for revenue generation.
Using these data as a predictive model, the team was able to demonstrate that they could predict opening weekend box office revenue with 97.3% accuracy. This compared favorably with a well-known prediction tool for movies, the Hollywood Stock Exchange (HSX), that had 96.5% accuracy. The results for these two techniques are shown in the graph below:
The exciting implication of this study is not that this particular application of using social media to predict box office revenue, but that a model has now been developed to to use social media to predict a wide variety of outcomes from product sales to elections. The researchers conclude:
While in this study we focused on the problem of predicting box office revenues of movies for the sake of having a clear metric of comparison with other methods, this method can be extended to a large panoply of topics, ranging from the future rating of products to agenda setting and election outcomes. At a deeper level, this work shows how social media expresses a collective wisdom which, when properly tapped, can yield an extremely powerful and accurate indicator of future outcomes.
Using Social Media to Predict Election Results
The predictive model used to forecast outcomes using social media developed by the HP Labs team is one of the few that is customized, well-defined, and mathematically rigorous. However, that hasn’t stopped people from predicting outcomes from social media trends, even if they lack such a powerful tool.
One of the most stunning election outcomes in the past few years was the victory of Scott Brown over Martha Coakley in the special Massachusetts U.S. Senate election to replace the vacant seat created by the passing of Ted Kennedy. Larry Kim published a blog post five days before the election forecasting the upset for Scott Brown.
At the time, conventional polls suggested that the race was too close to call, despite the fact that the Democrat Coakley had been the early front runner and had a seeming lock on the seat, given the nature of the Massachusetts electorate. However, while Coakley coasted through the campaign, the hard work and grass roots effort employed by the Brown campaign paid huge dividends. Kim’s analysis of social media trends showed that Brown had developed a huge advantage is social media presence:
- 10:1 Advantage in YouTube video views
- 4:1 Advantage in Facebook fans
- 3:1 Advantage in Twitter mentions
- 10:1 Advantage in estimated web traffic
The trend in web traffic as measured by Alexa was particularly telling:
While the data looked overwhelming, Kim, to his credit, lacking a quantitative predictive model such as that employed by the HP Labs team, cautioned that the type of people who were heavy users of social media were undoubtedly a biased sample that was perhaps not representative of the electorate as a whole. However, the trends looked so overwhelming that, on the basis of this data, Kim concluded that Scoot Brown was headed for a victory. Five later Brown proved the prognostication accurate.
Using Social Media to Predict Fashion Trends
One final example of the predictive power of social media involves the notoriously fickle and unpredictable fashion world.
Luke Brynley-Jones of Our Social Times reports that Geoff Watts from Stylesignal has developed a new social media monitoring tool that helps to track new fashion trends. The tool is used to monitor the sites of opinion leaders in the fashion world. The data collected by the tool is then analyzed offline to predict fashion trends. Case studies on the site claim that the StyleSignal has helped correctly predict the colors, shapes, and styles that become trend setters.
It has become increasingly clear that social media can be used to predict future events with accuracy. Now that predictive models have been developed for quantifying and measuring the accuracy of predictions, you should expect to see explosive growth in the use of social media in forecasting.