A Cloudy Future: The Possibilities and Perils of “Big Data” for Social Movement Research

By Thomas Elliott

Technological developments in the past decade have moved an incredible amount of human interaction online. Facebook, Twitter, and other social networking sites have collapsed the producer/consumer dichotomy (and created the cringe-worthy portmanteau prosumer) such that nearly everyone online is generating an enormous amount of data about themselves and their interactions. Social scientists are gaining increasing access to this data, either through web scraping technologies or application program interfaces (APIs) provided by the services themselves. The web has also made it easier to collect data directly, through online survey services and other similar technologies. This has resulted in datasets larger than most sociologists have ever dealt with, which leads us to the frustratingly ambiguous term “big data.”

While these new data sources are likely to enable social movement scholars to pursue new research questions and develop new theory, we should be cautious. We should consider what kind of biases are inherent to this type of data. What types of interactions and activities are recorded by the sources of our data? What types of interactions and activities are not recorded? Who or what are we missing if we focus on data collected from sites like Twitter and Facebook? What are the implications of services like Snapchat, whose records of interaction are not available, but which represents a significant amount of online interaction? These questions will set the scope of the new theory and research questions derived from this data. Smaller organizations with less resources may rely on services like Twitter and Facebook to reach their members and the broader public, while organizations with more resources may continue to use more traditional tools. On the other hand, organizations with a geographically broader constituency may use Facebook and Twitter more to reach their constituency than organizations with a much more local focus, which may rely instead on more targeted methods of interaction including face-to-face and phone calls. These online services are also likely to be biased towards certain types of activities. Activities that involve the participation of large numbers of people are likely to be represented, while more professional types of activities, like lobbying, are not.

These new services also raise questions about the nature and form of social movements. Is gamergate, a misogynist collection of largely anonymous participants in particular forums online opposed to the growing presence and influence of women in the gaming community, a social movement? Their actions, which largely consist of extreme forms of harassment, are entirely online, the participants are anonymous, and there is seemingly no formal organization; however, the targets of this group, female gamers who have called for better portrayals of women in video games, have felt very real consequences. Do these services facilitate a new form of mobilization? Or are these simply old forms translated to a new arena?

Another avenue of research we should pursue is how the structure of these services constrain how people use them, and the consequences of this in terms of how movements use them. Twitter, for example, only allows tweets to be 140 characters long (a legacy of the service’s initial use of SMS for delivering tweets). This constrains how conversations can happen on Twitter, giving rise to the hashtag – a word or short phrase that stands in for a larger concept. Where do hashtags like #YesAllWomen or #BlackLivesMatter fit in social movement theory? Are they the new frame? How might movement organizations use hashtags to their advantage? Are there instances of movement organizations using hashtags successfully? #BlackLivesMatter is an interesting case where the hashtag spawned the organization – how does this inform our understanding of movement emergence? Are hashtags a tactical tool for movements to use, part of the larger opportunity field, or both?

In addition to theoretical questions like those above, there are also practical concerns. These new data will require new analytic methods. Regression simply won’t cut it when you’re talking about millions of observations. This is especially true if much of the data is text, as data pulled from Twitter is. Since the types of analytic techniques needed to analyze these types of data are not built into the statistical tools sociologists typically use (e.g., Stata or SPSS), these types of analyses require a shift in the analytic technologies social movement scholars must use. Currently, that shift is towards pythonPython, a computer programming language that is easier to use than most computer programming languages that came before it, but still requires a very specialized set of skills to use properly. This suggests we should seek out collaborations with computer scientists or informaticians who have been working on analysis of massive datasets for much longer.

New tools for analysis also require us to think about how we store and share data. For massive datasets, storage in spreadsheets is impractical, so we should think about using storage engines that were designed for massive datasets and that easily hook into Python (or whatever analysis tool we use) so that Python can access the data when it performs our analysis. Sharing data will be a concern too–for datasets that are measured in gigabytes or larger, it may make more sense for these databases to be accessible online (if there are no privacy concerns) so that anyone wanting to use the data for their own analysis can easily download only what they need rather than copying and setting up the full dataset. Setting up these systems will require more resources (in time, in hardware, and in skills) than has previously been needed to store and analyze data. As we move forward, we may want to consider collaborating to develop our own tools and services to make this type of data collection, storage, and analysis available to any scholar interested regardless of their skill level with the more technical side of “big data.”

The possibilities presented by the availability of new sources and quantities of data are certain to provide new insights into how social movements emerge, operate, and succeed. However, these new data require careful and thoughtful attention to their limitations and challenges to avoid misuse and abuse.

1 Comment

Filed under Essay Dialogues, Social Movement Data

One response to “A Cloudy Future: The Possibilities and Perils of “Big Data” for Social Movement Research

  1. Pingback: big data and social movements | orgtheory.net

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s