The proliferation of ample, publicly available information from varied sources is an outcome that we should celebrate as scholars. While this proliferation does not offer a panacea for all research needs, it does offer numerous insights unavailable to scholars a generation ago. This potential for insight stems not only from increased data availability, but also from the sociological imagination and creativity of movement scholars who can leverage the flexibility afforded by modern information systems.
For this dialog, I would like to focus exclusively on “quantitative data” from publicly available sources, using passive data collection methods, for the purpose of better understanding social mobilization. In doing so, I will sidestep or only lightly touch upon subjects such as online activism and its implications offline, active data collection (e.g., administering online surveys), questions of research ethics and privacy related to online media, and methods of data analysis. Though these related topics warrant significant attention from the subject at hand, serious attention to these issues would divert from my ability to succinctly address this dialog’s theme.
It’s no secret that social mobilization and movement participation are incredibly difficult concepts to measure. Traditionally, quantitatively-minded movement scholars have measured these concepts using data from surveys, archival materials, and hand coding newspaper articles. Without question, these approaches have generated incredibly rich, customized datasets produced by experts in their respective areas of inquiry, and have informed the vast majority of our knowledge on social movements. Collecting such data unfortunately demands a tremendous investment of time and money. Such demands obviously preclude rapid, systematic analyses capable of engaging a general public at their peak point of interest in a movement. Given the ephemeral nature of most movements, these methods put us at a disadvantage with regard to practicing public sociology. Further, as funding opportunities continue to diminish in the social sciences, the costs associated with these methods will prove difficult to meet. Moreover, given the limitations associated with surveys, archival materials, and newspaper codes, I think we can develop more robust theories by exercising creativity in the data collection process. After all, if the success of a theory depends on a particular data source, it likely reflects the information analyzed rather than a general social property.
Though certainly not unproblematic, today’s social movement scholars–alongside journalists and laypersons–can collect data on social mobilization and movement participation relatively easily. Naturally, such research often turns to social media. What makes collecting social media data so appealing is that movement actors want to be heard and online platforms provide a venue in democratic and “democratish” regimes. Such venues can dramatically increase the representation of smaller events and organizations relative to traditional data sources. By now, collecting and visualizing social network data on subjects like movement hashtag usage in Twitter has become commonplace due in part to open source software like NodeXL and Gephi, as well as general tools like Netlytic, each of which can quickly collect and process a huge body of information momentarily for free using a point-and-click interface. Likewise, Google Trends offers both time series and cross-sectional data on the frequency of search terms which could indicate the relative popularity of a movement-associated phrases across both space and time. Unfortunately, when taken purely at face value, such descriptive findings from social media sources leave much to be desired. Beyond the issue of offline implications, scholars face incredible challenges generalizing across social media platforms, as each has different user bases (e.g., Vkontakte is typically described as “Russian Facebook”), features (e.g., an emphasis on image sharing versus text), and user interface rules that govern observed outcomes (e.g., the algorithm behind Facebook’s newsfeed versus Twitter’s timeline, directed or undirected relationships in user association). Problems regarding spamming, trolling, and thread hijacking introduce further analytic headaches. Many of these issues can be ameliorated by an informed selection of the platform analyzed, narrow search parameters that best address the case and platform, filtration of misleading cases (preferably by automated procedures), and a strong appreciation for the time and location that produced the search results. In short, providing informed analyses on movements in social media requires rich domain knowledge, both on the platform as well as the case analyzed.
It’s important to not take for granted that mobilization occurs at a specific time and place. We have never before had better access to information on locations and time. Barring content expiration, tweets on Twitter can be searched by time and geolocation and therefore linked to protest sites producing communication networks of people in nearby proximity; likewise, photos posted on Flickr often include time and locational information embedded in the picture’s EXIF metadata, potentially providing visual information on a protest that could provide estimates for event sizes or even processed using image analysis techniques. Imagine the possibility of assessing protest participant characteristics–like gender, attire, propensity to smile–systematically using a combination of photos taken at protest events and image analysis techniques like those underlying Jetpac! We could then produce rough approximations of protest event demographics, an incredibly elusive topic in our field. Given that we can now access protest event data pinpointed to the exact time of day, date, and location, the traditional longitudinal unit of country-year seems grossly underspecified in comparison.
Beyond rich information from protest sites, we now have a tremendous capacity to triangulate data sources with common temporal and locational information to the capture “environmental factors” which encompass movement actors’ and nonparticipants’ collective contexts. Such factors have traditionally spoken to theories related to political opportunity, resource mobilization, and infrastructure. Businesses and municipal governments alike have increasingly made such data publicly available and conveniently accessible. For businesses, websites like Google Maps, Yelp, and Foursquare provide a way to inform and attract potential customers. Services like these ones can, in turn, tell researchers about the availability of local, public meeting places; organizational ecology; lifestyles characterized by affluent or impoverished consumption practices; or even the distances to social movement organization offices. Data from municipal governments (e.g., Seattle, Moscow), often covers subjects that include healthcare facilities, housing, and transportation infrastructure which can identify social cleavages that encourage or discourage collective action from the city’s residents. In the recent past, such information existed only in municipal archives and nearby libraries. Considering how convenient it has become to access data from distant locations, along with the global diffusion of the “open data movement” (e.g., Uganda and Kazakhstan), comparative research in movements will likely further develop in the near future.
While these developments offer tremendous potential, numerous difficulties will remain. On the upside, our research will likely see a decrease in data collection time and expense, an increase in analytic specification, a diversification in explanations for movement phenomena, and greater potential for comparative research. Unfortunately, issues of data coverage and representation will continue. Naturally, it will continue to be difficult to make substantial statements on clandestine and illegal movement organizations, as they either avoid public forums online or they obscure their identities and activities.
While access to historical materials has markedly improved as more and more texts have been scanned and digitized due to efforts like Google Books and Project Gutenberg, the volume of information on recent movements will greatly overshadow that of their predecessors. Lastly, in at least the short-term, we can expect the digital divide to affect movement and participant representation in the resultant datasets, such that more comprehensive datasets will likely over-represent the positions taken by privileged groups.