By Jen Schradie
I know, I know, it’s digital blasphemy to say that using Internet data is a terrible way to study social movements. What about all of those Twitter and Facebook revolutions of the Arab Spring? And Occupy Wall Street? #Ferguson and #BlackLivesMatter spread like wildfire, for God’s sake.
You may think that I’m a luddite who doesn’t see the sheer statistical splendor and speed of social network diagrams or automated text analyses made from Tweets. Or, perhaps you’re thinking that old-school scholars just don’t get it: digital activism is the future, so we need to disrupt, innovate and flatten those hierarchical Marxist social movement sociologists.
But before you reach through your screen and strangle me with your iPhone charger cord, consider these ways in which online data, whether social media or otherwise, might not be as representative or generalizable as they are fast and efficient.
1. Hashtag data are often cherrypicked
A problem in social movement studies, in general, has been selecting on the dependent variable. This means that researchers choose movements to study based on a certain factor but then make claims based on that factor. With online Big Data, this means that we need to be careful not to choose movements to study based on high levels of Internet use and then use those cases as evidence that the Internet is critical for social movements. This skews theories and leads us to make generalizations, often celebratory, about digital activism. A related problem is selection bias. How representative is the hashtag for the research question at hand? What may get lots of Tweets or Facebook chatter may just be successful protests at the peak of their visibility. We can’t just study the extraordinary and not the ordinary.
2. Big Data is too Small
Huh? Aside from the grammatical mistake in this phrase (data are plural), Big Data, particularly Tweets and Facebook posts, exclude those who are on the other side of the digital divide. A gaping hole in the digital cloud is that online Big Data can leave out the poor and working class. Despite assertions that digital inequality is passé, it remains alive and well, even in highly wired societies. Inequality persists between those who can afford consistent and high quality internet connectivity and those who cannot. But the Internet data is not just gleaned from those who consume online content but also from those who produce it. When it comes to creating publicly available Internet content, those with more income and education are overrepresented among digital content creators, as class is the most significant demographic determinant of online production, even compared to gender, race, or ethnicity. People who are not wired, let alone have the resources, skills or confidence for content creation, are often absent from online data analytics. Our hashtag data may not be big enough, not in sample “n” size, but in generalizability.
Oh, don’t call me a digital dualist. Yes, the line between the online and offline worlds is murky, if even nonexistent. Still, we should not be fetishizing online spaces as where everything happens. In a digital activism class I taught, I gave my undergrads an assignment to follow the Twitter stream of a protest movement I had been studying. From conducting in-depth interviews and observations, I knew that Twitter had nothing to do with the grassroots organizing that made this particular protest part of the largest movement in the state for decades, yet many of the students’ interpretation was that the protests would not have happened without social media. This is a common problem in digital activist scholarship that focuses on the online – attributing way too much agency and power to technology without understanding the offline context.
4. So much data, too little qualitative methods
Ok, some of my best friends use quantitative methods. Wait, I use quantitative methods, and with online data, too. Big data is a literal gold mine with datasets so big and software so easy to use that it gets our adrenaline going. I started doing network analysis with an early (clunky) version of UCINET with manual data input. But when I used NodeXL, software for quickly downloading and analyzing online data, I was hooked on the quick and easy number crunching. But what are we losing in the process? We also need thick descriptions, depth, and detail of everyday social movement practices, as well as better answers about the how of digital activism. Plus, bots! Bots and spam flood online data, creating lots of noise. Qualitative methods, including online ethnographies and digital interviews, help sort through and contextualize digital content.
5. Online data can ignore societal structures
I’m trying really hard not to say ‘technological determinism.’ This theory has been junked, tweaked and hacked for decades. In its purest form, this is the idea that technology directly and independently affects societal outcomes. Few scholars make such claims that only social media cause social movements to succeed, for instance. The broader danger, though, is when the Internet is positioned as the all mighty independent variable, rather than the reverse. What about how different societal structures shape technology use? Facebook posts and tweets, in particular, as units of analysis tend not to point toward bigger questions for social movement scholarship, such as the role of the state or political economy.
Ok, so is this post really about online Big Data as Bad Data? Well, not quite. I used the (in)famous Buzzfeed approach with my title as clickbait. If you’ve never seen Buzzfeed, you probably hate cats. Buzzfeed is an online news and entertainment site that has been wildly successful getting users to click on their site by using provocative headlines like mine. But herein lies the problem. Social media are like clickbait for social movement researchers. It’s a wonderful drug – in moderation. If you haven’t guessed it by now, I use online/social media/Big data in my research. I’ll let others point out its advantages. There are many. But this post, more than anything else, is not really to advise you against using this type of data. Instead, it is a cautionary tale.
In another life, I used to produce and direct videos for public agencies. But we would always ask our clients if another medium – print, web site, etc. – could accomplish the goal they wanted to accomplish with a video. Yes, we tried to talk them out of using the medium that was our livelihood because it wasn’t always the best way for these organizations, often struggling for resources, to get their message out. Similarly, does online data really answer the research question you want to ask? Or is it just so much fun to manipulate Big Data? Hint: it’s ok if you answered “Yes” to both.