Predictive Marketing

Predicting Market Demand With Social Media

Social media is now being extended beyond its original applications into a tool for predicting the future. The exponential growth in social media has helped create a large body of content that reflects the trends, experiences, evaluations, and sentiment of the marketplace. It is becoming increasingly apparent that this content can be mined and analyzed to help predict the size of markets, the outcomes of marketing campaigns, and marketing ROI. In this post I’ll take a look at three ways in which data generated by social media has been used recently for Predictive Marketing.

Predicting Movie Box Office Results with Twitter

A group of researchers at HP Labs recently published a paper describing how they used data captured from Twitter posts to predict box-office revenue at the movies. The researchers extracted 2.89 million tweets from 1.2 million users referring to 24 different movies over a period of three months. For each tweet, the timestamp, author, and tweet text were collected and used for analysis. The researchers focused on what they termed the “critical period” – the week before and the two weeks after the release of a movie.

An initial analysis of the tweets revealed that:

  • The tweets built up in volume the week before the movie release; peaked at the time of the release; and fell during the two weeks following the release.
  • The average number of tweets made by individuals about a particular movie was between 1 and 1.5.
  • The distribution of tweets by individuals showed that a handful of individuals made many tweets; the distribution followed the “long tail” power distribution that frequently occurs on the web.

The team then proceeded to analyze the data for predictive power. First, what didn’t prove to be very good predictors of box office success:

  • Prior to the release of a movie, studios promote the film heavily via TV, print, news releases, interviews with the stars, and trailer videos. The researches classified tweets according to whether they contained urls, indicating that they could reference trailers, movie reviews, or other PR about the movie. It turned out that although 22% to 40% of the tweets contained urls, such tweets were only mildly predictive of box office success.
  • The percentage of retweets was in the 11%-12% range, and was even less predictive of box office success. This is surprising, given that retweets are indicative of word-of-mouth.

There were three factors that proved to be powerful predictors of box office revenue:

  • The tweet rate, or the number of tweets about a given movie per hour. This is indicative of the overall attention and interest that the movie is generating. This factor is particularly important in predicting the box office reevnue for the opening weekend.
  • Positive sentiment about the movie. The researchers created a customized method for analyzing positive and negative sentiment about movies for the purposes of this study. Sentiment proved to be an important factor in predicting box office revenue in the weeks after the opening.
  • Distribution, or the number of theaters in which the film screened. The wider the distribution, the more opportunity that existed for revenue generation.

Using these data as a predictive model, the team was able to demonstrate that they could predict opening weekend box office revenue with 97.3% accuracy. This compared favorably with a well-known prediction tool for movies, the Hollywood Stock Exchange (HSX), that had 96.5% accuracy. The results for these two techniques are shown in the graph below:

The exciting implication of this study is not that this particular application of using social media to predict box office revenue, but that a model has now been developed to to use social media to predict a wide variety of outcomes from product sales to elections. The researchers conclude:

While in this study we focused on the problem of predicting box office revenues of movies for the sake of having a clear metric of comparison with other methods, this method can be extended to a large panoply of topics, ranging from the future rating of products to agenda setting and election outcomes. At a deeper level, this work shows how social media expresses a collective wisdom which, when properly tapped, can yield an extremely powerful and accurate indicator of future outcomes.

Using Social Media to Predict Election Results

The predictive model used to forecast outcomes using social media developed by the HP Labs team is one of the few that is customized, well-defined, and mathematically rigorous. However, that hasn’t stopped people from predicting outcomes from social media trends, even if they lack such a powerful tool.

One of the most stunning election outcomes in the past few years was the victory of Scott Brown over Martha Coakley in the special Massachusetts U.S. Senate election to replace the vacant seat created by the passing of Ted Kennedy. Larry Kim published a blog post five days before the election forecasting the upset for Scott Brown.

At the time, conventional polls suggested that the race was too close to call, despite the fact that the Democrat Coakley had been the early front runner and had a seeming lock on the seat, given the nature of the Massachusetts electorate. However, while Coakley coasted through the campaign, the hard work and grass roots effort employed by the Brown campaign paid huge dividends. Kim’s analysis of social media trends showed that Brown had developed a huge advantage is social media presence:

  • 10:1 Advantage in YouTube video views
  • 4:1 Advantage in Facebook fans
  • 3:1 Advantage in Twitter mentions
  • 10:1 Advantage in estimated web traffic

The trend in web traffic as measured by Alexa was particularly telling:

While the data looked overwhelming, Kim, to his credit, lacking a quantitative predictive model such as that employed by the HP Labs team, cautioned that the type of people who were heavy users of social media were undoubtedly a biased sample that was perhaps not representative of the electorate as a whole. However, the trends looked so overwhelming that, on the basis of this data, Kim concluded that Scoot Brown was headed for a victory. Five later Brown proved the prognostication accurate.

Using Social Media to Predict Fashion Trends

One final example of the predictive power of social media involves the notoriously fickle and unpredictable fashion world.

Luke Brynley-Jones of Our Social Times reports that Geoff Watts from Stylesignal has developed a new social media monitoring tool that helps to track new fashion trends. The tool is used to monitor the sites of opinion leaders in the fashion world.  The data collected by the tool is then analyzed offline to predict fashion trends. Case studies on the site claim that the StyleSignal has helped correctly predict the colors, shapes, and styles that become trend setters.


It has become increasingly clear that social media can be used to predict future events with accuracy. Now that predictive models have been developed for quantifying and measuring the accuracy of predictions, you should expect to see explosive growth in the use of social media in forecasting.

The Best Time of Day to Tweet

Twitter has become an increasingly popular and important tool for businesses to keep in touch with their customers. Twitter is a medium unlike any other. Each tweet has a limited life-span – if it is not read within a short time of its being posted, the chances of it ever being read drop exponentially. The constant stream of new tweets from the group of individuals each twitterer is following makes it unlikely that the tweet will be read if it is a few hours old. For few twitterers capture all of their tweets in RSS feeds, or take the time to examine all the latest tweets from more than a handful of individuals. For a business hoping to broadcast a message that is read my the most followers possible, timing is of the essence.

So then, what is the best time of day to tweet? There have been several approaches to answer this question:

As you can see, there are a lot of different opinions about the best time to tweet. In order to develop the best answer possible to this question, I collected data over the course of several weeks for a business whose followers consist primarily of event professionals.

The data set consisted of several thousand tweets, including the username, the time and day of the tweet, and the tweet itself. For the purpose of this analysis, I assumed that the best indicator of a given twitterer’s degree of engagement was whether or not they had tweeted within a given hour. So in order to determine the best time of day to tweet, what is most important is not the number of tweets being posted at a particular time, but the number of unique users posting tweets. Here’s the data, in Eastern Time:

For this group of followers, there are actually two optimal hours to tweet – 10:00 – 11:00 AM and 12:00 – 1:00 PM. Tweets during these two hours reach 23.7% of the total number of followers, an 18% advantage over the next best time, 11:00 AM – 12:00 PM, and a 31% advantage over 1:00 PM – 2:00 PM. These increases in total available audience are  highly significant to a business with thousands of followers.

Notice that, for this particular group, a tweet during the hour beginning at 9:00 AM, the beginning of Gary McAffrey’s time window, would only reach an available audience that is two-thirds the size of that available during 10:00 – 11:00 AM and 12:00 – 1:00 PM. Malcolm Cole’s suggestion of 4:01 PM reaches an available audience that is less than half the size – only 41% – of that of the best time to tweet.  Guy Kawasaki’s formula of four tweets varied over 8 – 12 hour intervals is a hit-or-miss proposition. In this particular case, the Social Media Guide is right on the money – the hour beginning at 9:00 AM Pacific/12:00 PM Eastern is best.

But does this pattern hold for every group of followers? Or does each group of followers have a unique pattern, a sort of “time fingerprint”? To answer this question, I examined a second group of followers of a CRM company. Here’s the data, once again expressed in Eastern Time:

This group is far different! The group following the CRM company is much more likely to be active during the morning hours, and is more evenly distributed over the entire day. As a result, a tweet to this group reaches a maximum of 10.8% of the total available audience, as compared to the group of event professionals, which peaked at 23.7%. The CRM group reaches its maximum at 11:00 AM – 12:00 PM, rather than the hour before or after, as in the previous case. So while a close approximation, the Social Media Guide guideline of 12:00 PM Eastern Time would for this group reach an audience 17% smaller than the peak time period of 11:00 AM – 12:00 PM.

As these two data sets demonstrate, there is no one best time to tweet for every business. Each business has a unique set of followers with their own Twitter “time fingerprint”.  You have to track the habits of your own set of followers in order to determine the best single time of day for your business to tweet.

Develop this graph for your own set of followers. How much different is your group compared to these two?

One of the most important insights from these two examples is that at any given time, you can only reach 10% – 24% of your followers with a single tweet. In a future post, I’ll examine what percentage of a group of followers can be reached with multiple tweets.


Predictive Marketing