The use of the internet by contemporary social movements is providing immense amounts of data to social movement scholars, and more and more researchers are devoting their energy to studying this phenomenon. The size, scale, and richness of these data, however, present a number of methodological hurdles. Alex Hanna highlights some methods we can use to analyze the content of text-based internet data–using Facebook as a case study–in his article “Computer-Aided Content Analysis of Digitally Enabled Movements” (Mobilization: An International Quarterly 18(4): 367-388, 2013).
Hanna did an immense amount of work to prepare his data for analysis, he thought carefully about his methods, and his descriptive statistics go a long way toward an empirical understanding of the way social movements use Facebook. His article is a great introduction to automated text analysis methods and is an example of how these methods can be used to study social movements and internet data. The article also highlights the difficulty of using automated methods to extract knowledge from text-based data, and because he didn’t provide any full quotes from the posts he was analyzing, or examples of how his methods were modeling these posts, I was left a bit unsatisfied.
In the article Hanna analyzes wall posts from a Facebook group that was formed to support the April 6, 2008 strike of textile workers in the city of Mahalla al-Kobra. While the group was initially expected to attract a few hundred users it quickly mushroomed to over 70,000 members, becoming the largest Facebook group in Egypt that addressed politics. Hanna collected over 60,000 wall posts from this group posted between 23 March 2008 and 29 May 2008. He then analyzed how members of this group used Facebook for different types of mobilization: offline coordination, internet action, media and press, reporting and events, and requests for information.
If this group was used for mobilization, he theorized, then the posts before and on the day of the event of interest should be focused on coordinating offline activities to enable the event and/or the media and press as activists attempt to gain the attention of the media. The number of posts made hand-coding everything a practical impossibility, thus Hanna set out to test his theory using computer-aided content analysis techniques.
Hanna used two methods within the broad umbrella of computer-assisted content analysis. The first was word counts, which are as they sound: counting the number of words related to outcomes of interest (in Hanna’s case his five categories) and often examining how these counts change over time. Through word counts Hanna found that offline coordination did not follow the expected pattern, but media did indeed spike on April 6 and May 4 (the two days of action). These results suggest users did attempt to use the Facebook group to mobilize the media for these events, but did not use it to organize offline activities.
The second method Hanna used was a supervised machine learning method. For this he had two research assistants hand code a sample of the total posts into six categories—the five mobilization categories and a category for none. He then used the software package ReadMe which “learns” how to categorize text from the hand-coded posts and reports the proportion of the uncoded posts in each category. The most notable result from this method was the high proportion of posts in the none category, suggesting the Facebook group was not predominately used for mobilization purposes, even through the frequency of total posts rose surrounding the days of mobilization. Within the five mobilization categories, however, Hanna interestingly found the opposite of the results from his word count method: he found that the proportion of posts in the offline coordination category did increase before and on the days of action, suggesting activists did use Facebook to coordinate offline activity, but the media and press category did not change notably over time.
Combining the results from the two methods he used he concludes that the use of this Facebook group did “not fit neatly into what we consider e-mobilization efforts” (385). This is an important finding, and the difficulty of getting to it can not be overstated.
In general I think his results would have been much easier to interpret if he had provided actual quotes from Facebook posts, showing what types of posts were in select categories. It would have then been easier for the reader to imagine what these word counts and categories might mean in practice. What were some examples of the way Facebook was used to coordinate activities? How was it used to get the attention of the press? What were some of the posts like in the none category?
Beyond this general suggestion I have two main comments following his analysis; the first is a comment on his seemingly contradictory findings, the second is a suggestion about where to go next.
1) How can we understand the opposing results from his two methods? While the word count method found that media words peaked on April 6 and May 4 and offline coordination did not follow the expected pattern, the machine learning approach found that the offline coordination category rose around the days of action but the media and press category did not. Why this discrepancy and what does it mean? Does this reflect something in the data or is it simply an artifact of the ambiguity of language and the difficulty of using automated methods to study it? I believe his findings are not contradictory, as they may appear, but they reveal something about the data. One potential explanation for his findings are that members of the Facebook group might have been complaining about the way the media was covering their actions and events, which would have produced a spike in the media word count on the day of the actions but would not be put in the press and media mobilization category. There are other possibilities. I wish Hanna would have explored what these findings mean a bit more, including providing some examples of the Facebook posts and how they are modeled through his methods. This would have clarified the ambiguity of his results and given more insight into how to use/interpret his methods and results.
2) What was going on in the none category? If not for mobilization, how else were the members in this Facebook group using this platform? Hanna clearly could not address everything in the limited space in this article, but I hope his future articles will unpack the ambiguity in this none category and suggest some conclusions about what the users were saying in these posts.
I look forward to more research from Hanna in the future, and the increasing use of these methods by more social scientists.