In my last post, I reviewed the ability of some of the most well known Twitter users to extend their reach through viral marketing. One well-known approach to viral marketing is to focus your message on a small number of highly influential people, who will then help to start a word-of-mouth chain reaction that effectively broadcasts your message to a wide audience at a low cost. Using this strategy requires that you can identify the most highly influential individuals in your target market. New research is now available to help facilitate the indentification process. Four researchers at the Max Planck Institute for Software Services recently published a landmark paper investigating how to measure and identify influence in social networks.
Measures of Influence
The researchers focused on Twitter users. With the cooperation of Twitter, they compiled a dataset used for the research that comprised more than 1.7 billion tweets among 54 million Twitter users containing nearly 2 billion follow links.
The researchers compared three different measures of user influence on Twitter:
- Indegree Influence, or the number of followers that a user has, an indicator of that user’s popularity.
- Retweet Influence, the number of retweets in the dataset containing a users’s name, a measure of their ability to propagate a message among their followers.
- Mention Influence, or the number if tweets containing a user’s name, indicating the ability of the user to initiate and maintain conversations with others.
The Million Follower Fallacy
One of the most interesting questions tackled by the study was to what degree the three measures of influence were correlated. The researchers focused on the 6 million most active Twitter users, and ranked each one according to each of the three measures. They then examined the correlation between the rankings, shown in the following table:
Correlation ranges on a scale of -1 to 1; a perfect positive correlation is 1 (meaning that a high rank in one measure tends to occur along with a high rank in another measure); a perfect negative correlation is -1 (meaning that a high rank in one measure tends to occur with a low rank in another measure); no correlation is indicated by a score close to 0. All three measures of influence were positively correlated. However, ties in rank among the lowest ranked in the 6 million active Twitter users artificially generated the relatively high correlation seen in the column “All” in the above table. The researchers therefore isolated the top 10% and top 1% of users based on their number of followers, and examined the correlations between the three measures of influence. The researchers reached the following conclusion:
After this filtering step, the top users showed a strong correlation in their retweet influence and mention influence…This means that, in general, users who get mentioned often get rewteeted often, and vice versa. Indegree, however, was not related to the other measures. We conclude that the most connected users are not necessarily the most influential when it comes to engaging one’s audience in conversations and having one’s messages spread.
This phenomenon has been dubbed “the million follower fallacy” and is one of the most important conclusions of the study: if your goal is to identify the users who are most likely to repeat your message to others in a viral marketing campaign, don’t look for the users in your target market with the most followers, look for the users with the most retweets and mentions.
As you move event further up the rankings, the overlap between the top 100 ranked users according to the three different measures of influence becomes smaller:
There were 233 distinct users that made the top 100 ranking in one or more of the three measures, and 67 that appeared on more than one of the top 100 lists.
Influence Across Different Topics
Another key question the team investigated was whether a user’s influence varied by different genres of topics. To address this question, they examined top-ranked influencers for three different topics which were among the most mentioned during 2009: the Iranian presidential elections, the H1N1 flu virus, and the death of Michael Jackson. Among the set of users who tweeted about any of these topics, only 2%, a set of 13,219 users, tweeted about all three topics, demonstrating the diversity of the topic genres.
Once again, the team looked at the correlations between the rankings on the topics, this time looking specifically at retweets and mentions among the most popular users, as measured by Indegree Influence.
Given the relatively high correlations among the most popular users in their ranks for retweets and mentions across these diverse topics, the researchers concluded that opinion leaders can hold sway over a wide variety of topics. This means that the opinion leaders can help spread a message outside their area of expertise. This is consistent with recent efforts to insert advertising links into popular user’s tweets. The fact that the degree of influence of users is a long-tail (power law) distribution also leads the authors to conclude that it is more economical to target top influentials to kick start a viral marketing campaign, rather than a massive number of less influential users.
How to Become Influential
The team next looked at three groups of users who tweeted about only one of the topics, to determine what factors make ordinary users more influential. These three groups of users had from 3 to 180 times fewer followers than the highest ranked influencers. In comparison to the top ranked influencers across all topics, this set of users that tweeted about only one of the topics saw their influence rise to a much greater degree over an eight month time period in 2009.
This lead the authors to conclude that through concerted effort, and focus on a single topic, users had the greatest chance of increasing their influence over time.
The key findings of the research are as follows:
- It is easier to kick-start a viral campaign by focusing on top influencers, rather than large numbers of individuals with a small degree of influence. This follows from the fact that all three measures of influence fall into a long-tail (power) distribution.
- If you are targeting individuals with a message with a view to extending your reach through viral marketing, the number of followers an individual has is less important than the number of times they retweet, and in turn, are retweeted. The number of followers a Twitter user has may indicate popularity, but this measure has a weak correlation to retweets and mentions.
- The most influential users can hold influence over a variety of topics, as measured by retweets and mentions.
- Ordinary Twitter users can best gain influence by focusing on a single topic, and consistently including links to useful and engaging content in their tweets, as opposed to focusing on conversations with other users.
Like all good research, the analysis presented by the authors suggests further areas for investigation:
- Are the same influence patterns evident in business-oriented communications, as opposed to the general interest topics specifically investigated in this research?
- The research focused on the users with the most rewteets and mentions. The data was not normalized to account for the number of their followers. What patterns emerge when the data is adjusted in this manner?
- Are retweets driven by the influence of the user, the content of the tweet, or the content of the link? If it is driven by all three factors, what is the relative importance of each?
- How can the research be used to predict the outcomes of social media campaigns?
The researchers are currently working with Twitter to make their entire dataset available to researchers. While the dataset is not yet available, you can check their website for updates on their data sharing plan. If and when the dataset becomes available, we can look forward to further detailed investigation of user influence in social media.
One of the many attractions of Social Media is the opportunity to amplify your message through viral marketing. In theory, if you can deliver the right message to a select number of the right people, you can reach thousands, or even millions of people on a shoestring budget. In previous posts, I have analyzed how to maximize the effective reach of your message on Twitter by deploying your tweets during the best time of day to your followers, and repeating that message at strategic times to extend your reach even further. The objective of both of these techniques was to reach as many of your followers as possible with your message. Let’s now examine the opportunity of extending your reach beyond your group of followers through viral marketing on Twitter.
The Lure of Viral Marketing
Every marketer dreams of the following scenario: You convey your message to a select group of individuals. Each of these individuals then repeats your message to one or more of their friends, who in turn repeat the message to one or more of their friends, and before you know it, your message has successfully reached millions of people.
The most recent example of this dream scenario was the Facebook campaign to have Betty White host Saturday Night Live. A 29 year old man from San Antonio started the campaign with the modest goal of gaining 5,000 fans on the Betty White to Host SNL (please?)! Facebook page. He reached his goal in a month and wrote a letter to Lorne Michaels, the Executive Producer of SNL, to encourage the selection of Betty as a host. The story was then picked up by major news agencies. A few months later, the Facebook page had 500,000+ fans, Betty White hosted SNL, and the show grabbed its highest ratings in 18 months.
Viral Marketing on Twitter
Dream scenarios are by definition rare. How much can you reasonably expect to extend your reach beyond your group of followers through viral marketing on Twitter?
The vehicle for viral marketing on Twitter is the retweet. It’s the indicator of how many times your message is repeated throughout the Twittersphere. To get an idea of what the typical results of viral marketing on Twitter are, let’s take a look at what some of the most retweeted users are able to achieve.
I have focused the above analysis on business or news oriented sites, since I am addressing the question of how business marketers can extend their reach through Twitter. Therefore, no celebrities or other non-business entities are included.
The data shows the total number of followers for each user, the average number of tweets they make each day, the largest number of retweets they generated from a single tweet during the week of May 13-19, and an estimate of the % increase in reach they achieved over and above their follower base with their best tweet of the week.
In the calculation of the percentage increase in reach, the analysis makes the assumption that on average, each user who retweets is followed by 300 people. Although 93.6% of Twitter users have less than 100 followers, I’ll use Hubspot’s estimate of an average of 300 followers for the most active 5 million Twitter users. This makes intuitive sense, since the users most likely to retweet are among the most active, and in a long tail distribution the average is higher than the median due to the effect of the users with the most followers.
The data is surprising. I would have expected that the number of retweets would have been much higher for the six users with more than one million followers. Mashable wins the award for most retweets at 1,018. The award for the greatest increase in reach goes to a tweet by HubSpot at 135%, more than doubling its reach via retweets. Although it was only retweeted 159 times, the much smaller number of followers of HubSpot in comparison with Mashable translates into a greater percentage increase in reach.
The data seems to indicate that the users with the relatively smaller following have the opportunity to gain the most in percentage reach. This is not surprising, since for users whose following exceeds one million, there is only so far that they can extend their reach.
The Limits of Viral Marketing on Twitter
The above examples show how much the best tweets of some of the most retweeted users are able to extend their reach via viral marketing on Twitter during a typical week. The two best case scenarios, for HubSpot and Avinash Kaushik, range from a 60% – 135% extension of their reach. In terms of the dream scenario for viral marketing, these gains may not seem like much, but in practical terms, any time that you can increase the effectiveness of your marketing by 60%+, that’s significant.
Twitter is not the best platform for viral marketing. Tweets are ephemeral; they come and go. Twitter lacks the permanence of a blog post or a Facebook page, making it harder to achieve the explosive exponential growth of a true viral campaign. And the dream scenarios of viral marketing are not achieved via a single marketing medium; they are achieved through a perfect storm of mutually reinforcing marketing media. For example, the Betty White campaign was heavily reinforced by traditional news media.
When it comes to Twitter, it may be best to remember Avinash Kaushik’s tweet: “Success on twitter comes fm participating in conversations & adding value. It does not come fm “social media campaigns”.
Tweets are ephemeral. Chances are, unless a person is engaged with Twitter when you tweet, they aren’t going to have the opportunity to read it. Not only do people ignore or lose track of old tweets, they are dropped from the Twitter database. Unlike a blog, which is long-lived and indexed by Google for future reference, tweets are heavily time-dependent. If you’re running a business trying to reach as many customers as possible with your corporate message, the timing and frequency of your tweets are critical to your success.
In a previous post, I examined the question of what time of day is best to tweet. To determine the answer, I analyzed two sets of data representing the behavior of two different groups of followers. It turned out that each group had a different best time of day to tweet, and that a single tweet reached between 10% and 24% of the followers.
That brings up the question: if you employ multiple tweets, what percentage of your followers can be reached? Guy Kawasaki recommends posting your most important tweets 4 times, 8 to 12 hours apart, to reach as many of your followers as possible. Let’s take a look at this question for the same two groups of followers analyzed in my previous post.
A quick review: I collected data over the course of several weeks for two Twitter groups – followers of a company supplying services to event professionals, and followers of a company selling CRM software. The data set consisted of several thousand tweets, including the username, the time and day of the tweet, and the tweet itself. For the purpose of this analysis, I assumed that the best indicator of a given follower’s availability to read tweets was whether or not they had tweeted within a given hour. I was then able to determine for any given hour of the day, how many unique followers were active, and presumably reading their Twitter stream.
To figure out the impact of multiple tweets on reach, I then ranked all of the hours in the day in order of how many unique twitterers there were during any given hour. Choosing each hour in order of priority, I then eliminated duplicates.
The results for the group of event professionals are as follows:
The graph displays the percentage of unique followers that can be reached for each tweet. For example, a single tweet during the best hour of the day can reach 24% of the followers, two separate tweets during the two most active hours of the day, 40%, and three 50%. The graph shows that using Guy Kawasaki’s rule of thumb, that you can reach 60% of your followers with four tweets (we’ll see later that these four tweets should not take place 8 – 12 hours apart). For this group of event professionals, it takes eight tweets to reach 80% of the followers.
Now let’s examine how reach is affected by multiple tweets for the CRM software group:
As you can see, there are dramatic differences between the two groups in the extent to which a given percentage of followers can be reached with the same number of tweets. For example, it takes ten tweets to reach 60% of the CRM software group, compared to the four tweets needed to reach 60% of the event professionals group.
Each group is different. If your business needs to make sure its message is reaching the widest possible audience, you need to develop a similar analysis for your group of followers.
The graphs above show only the number of tweets required to reach a given percentage of followers, but not what times to tweet. The chart below reveals when each tweet should be deployed to achieve the reach shown in the graphs above.
Note that the first three tweets for each group, while not in identical order, occur in the 10:00 AM – 12:59 PM time period. After the first three tweets, the best time for additional tweets varies according to the group. In both cases, Guy Kawasaki’s rule of thumb – four tweets 8 – 12 hours apart – would not maximize reach. To be fair to Guy, his rule of thumb may well work for his group of followers. The point I’m making here is that you can’t generalize – there is a different strategy to maximize reach for each group of followers.
- If you are trying to maximize the effective reach of your message, the ephemeral nature of a tweet puts a premium on the timing and frequency of your tweets.
- A single tweet will only reach a fraction of your followers. For the two groups examined, the range was 10% to 24%.
- By analyzing the times during which your followers tweet, it is possible to develop a strategy to predict the percentage of your followers that you can reach with multiple tweets.
- It is also possible to determine the best times of day for multiple tweets. Note that the muliple tweets don’t necessarily have to take place during one day; they can be spread out over several days so as not to annoy your most attentive followers.
- Every group of followers is different. You need to analyze the tendencies of your followers to determine the optimal strategy for maximizing the reach of your most critical messages.
Social media is now being extended beyond its original applications into a tool for predicting the future. The exponential growth in social media has helped create a large body of content that reflects the trends, experiences, evaluations, and sentiment of the marketplace. It is becoming increasingly apparent that this content can be mined and analyzed to help predict the size of markets, the outcomes of marketing campaigns, and marketing ROI. In this post I’ll take a look at three ways in which data generated by social media has been used recently for Predictive Marketing.
Predicting Movie Box Office Results with Twitter
A group of researchers at HP Labs recently published a paper describing how they used data captured from Twitter posts to predict box-office revenue at the movies. The researchers extracted 2.89 million tweets from 1.2 million users referring to 24 different movies over a period of three months. For each tweet, the timestamp, author, and tweet text were collected and used for analysis. The researchers focused on what they termed the “critical period” – the week before and the two weeks after the release of a movie.
An initial analysis of the tweets revealed that:
- The tweets built up in volume the week before the movie release; peaked at the time of the release; and fell during the two weeks following the release.
- The average number of tweets made by individuals about a particular movie was between 1 and 1.5.
- The distribution of tweets by individuals showed that a handful of individuals made many tweets; the distribution followed the “long tail” power distribution that frequently occurs on the web.
The team then proceeded to analyze the data for predictive power. First, what didn’t prove to be very good predictors of box office success:
- Prior to the release of a movie, studios promote the film heavily via TV, print, news releases, interviews with the stars, and trailer videos. The researches classified tweets according to whether they contained urls, indicating that they could reference trailers, movie reviews, or other PR about the movie. It turned out that although 22% to 40% of the tweets contained urls, such tweets were only mildly predictive of box office success.
- The percentage of retweets was in the 11%-12% range, and was even less predictive of box office success. This is surprising, given that retweets are indicative of word-of-mouth.
There were three factors that proved to be powerful predictors of box office revenue:
- The tweet rate, or the number of tweets about a given movie per hour. This is indicative of the overall attention and interest that the movie is generating. This factor is particularly important in predicting the box office reevnue for the opening weekend.
- Positive sentiment about the movie. The researchers created a customized method for analyzing positive and negative sentiment about movies for the purposes of this study. Sentiment proved to be an important factor in predicting box office revenue in the weeks after the opening.
- Distribution, or the number of theaters in which the film screened. The wider the distribution, the more opportunity that existed for revenue generation.
Using these data as a predictive model, the team was able to demonstrate that they could predict opening weekend box office revenue with 97.3% accuracy. This compared favorably with a well-known prediction tool for movies, the Hollywood Stock Exchange (HSX), that had 96.5% accuracy. The results for these two techniques are shown in the graph below:
The exciting implication of this study is not that this particular application of using social media to predict box office revenue, but that a model has now been developed to to use social media to predict a wide variety of outcomes from product sales to elections. The researchers conclude:
While in this study we focused on the problem of predicting box office revenues of movies for the sake of having a clear metric of comparison with other methods, this method can be extended to a large panoply of topics, ranging from the future rating of products to agenda setting and election outcomes. At a deeper level, this work shows how social media expresses a collective wisdom which, when properly tapped, can yield an extremely powerful and accurate indicator of future outcomes.
Using Social Media to Predict Election Results
The predictive model used to forecast outcomes using social media developed by the HP Labs team is one of the few that is customized, well-defined, and mathematically rigorous. However, that hasn’t stopped people from predicting outcomes from social media trends, even if they lack such a powerful tool.
One of the most stunning election outcomes in the past few years was the victory of Scott Brown over Martha Coakley in the special Massachusetts U.S. Senate election to replace the vacant seat created by the passing of Ted Kennedy. Larry Kim published a blog post five days before the election forecasting the upset for Scott Brown.
At the time, conventional polls suggested that the race was too close to call, despite the fact that the Democrat Coakley had been the early front runner and had a seeming lock on the seat, given the nature of the Massachusetts electorate. However, while Coakley coasted through the campaign, the hard work and grass roots effort employed by the Brown campaign paid huge dividends. Kim’s analysis of social media trends showed that Brown had developed a huge advantage is social media presence:
- 10:1 Advantage in YouTube video views
- 4:1 Advantage in Facebook fans
- 3:1 Advantage in Twitter mentions
- 10:1 Advantage in estimated web traffic
The trend in web traffic as measured by Alexa was particularly telling:
While the data looked overwhelming, Kim, to his credit, lacking a quantitative predictive model such as that employed by the HP Labs team, cautioned that the type of people who were heavy users of social media were undoubtedly a biased sample that was perhaps not representative of the electorate as a whole. However, the trends looked so overwhelming that, on the basis of this data, Kim concluded that Scoot Brown was headed for a victory. Five later Brown proved the prognostication accurate.
Using Social Media to Predict Fashion Trends
One final example of the predictive power of social media involves the notoriously fickle and unpredictable fashion world.
Luke Brynley-Jones of Our Social Times reports that Geoff Watts from Stylesignal has developed a new social media monitoring tool that helps to track new fashion trends. The tool is used to monitor the sites of opinion leaders in the fashion world. The data collected by the tool is then analyzed offline to predict fashion trends. Case studies on the site claim that the StyleSignal has helped correctly predict the colors, shapes, and styles that become trend setters.
It has become increasingly clear that social media can be used to predict future events with accuracy. Now that predictive models have been developed for quantifying and measuring the accuracy of predictions, you should expect to see explosive growth in the use of social media in forecasting.