I think my conscious experience of 'big data' probably started when school administrators decided to put RFID chips in our school IDs and to require us to wear them in a visible spot on our shirts all day, every day. It seems like a bad idea - I don't know what data they would gather from tracking our movements around the school since we had a class schedule we were suppose to follow anyway - but someone in administration probably said the magic words 'assessment' and 'data,' and we got chipped. That was in junior high. Since then the push for data and the concomitant rise of 'big data' as a concept and a buzzword has been approaching critical cultural mass. As bigdata has become more ubiquitous we've gotten more and more used to being seen as data, as content; when someone steals our data to use as their own we say that they've 'stolen our identity,' as if we are reducible to the quantifiable aspects of our lives. More than this, our informational identity is a kind of online currency now: we constantly give away our information as the price of admission to 'free' social media applications, email service providers, and online sales aggregators. But is that actually good?
Obviously some would say yes. If you google 'big data' about two-thirds of the results will be hopeful think pieces and suppositious how-tos laying out various hot takes on the uses and amazing future of bigdata. A company called Knewton is working with Houghton-Mifflin and Gutenberg Technology "to create smarter digital textbooks that adapt to individual student in real time" (and to sell massive amounts of hard- and software, much like the Los Angeles iPad education boomlet you may remember from 2013, which didn't turn out so well). Doubtless the ability to 'adapt' to an individual student will require data input from that student, which data will in turn become saleable. This seems like a more complicated version of being chipped, and while the companies involved feel this is a good use of data, I'm not so sure. On the other hand, the NIH is currently funding a study called the Models of Infectious Disease Agent Study (MIDAS) that hopes to predict disease outbreaks by looking at wikipedia searches related to dengue fever and influenza in seven different languages, and so far this approach noticeably outperforms more traditional public health initiatives. One of the researchers involved, Lauren Ancel Meyers of the University of Texas at Austin, has also created a data visualization tool called the Texas Pandemic Flu Toolkit, and this kind of application of bigdata seems, on the whole, remarkably good. In both cases the common denominator is data, but clearly there are many different kinds of 'good' when it comes to how we use the vast amounts of information we gather.
Others have begun to work on the politics of bigdata, asking what our informational ethics should be. As we've written here before, we believe in Open Access and sharing information, and that includes data. However, Gerry Canavan recently pointed out via a series of tweets that some information-sharing is not exactly chosen, but required, especially in the academy. In a two-part storify that aggregates the tweets Canavan, a professor at Marquette University, sent out on this topic in response to a series of tweets by another user complaining about dissertation embargoes, he points out that "[T]o assert “the commons” as a global value, and then use it to grab the intellectual property of the people on the bottom rung, is bad." We agree; populating the commons by turning to forced participation makes the exercise seem more like data farming than data sharing, and the inevitable question of who sows and who reaps is hard on the heels of such a shift. Others, less concerned with the political than with the technological aspects of bigdata, point out that even bigdata is being superceded by huge data/smart data/green data, and Cisco systems is predicting that by the end of next year we will have entered the era of the zettabyte.
So information sharing and data mining are here to stay, at least until we lose power and all the data disappears. What should we do about that, as a community of scholars whose attention is focused on writing center research and practice? Praxis believes that we should jump into the data pool, but with proper flotation devices at the ready and with a group of friends to splash around with. That is why, later this summer, Praxis will launch PRX, a data-sharing research exchange that will feature downloadable data sets both (very) large and small. Our vision is for PRX to furnish researchers with a place to share their data in order to promote dialogue and to make the field more horizontal in terms of data access for individual scholars, but we also see an application for writing center administrators. As every Writing Center administrator knows, assessment based on data is crucial to the survival of a writing center, and we believe that a more concentrated focus both on quantitative scholarship and on the data itself is sorely needed. Without a sustained, dynamic conversation on the topic, how are WC administrators supposed to know what to do with data, what data to collect, and how to present that data to others? How can we have a serious conversation about the politics of data collection, and especially of data presentation, without a forum?
So we are now inviting data submissions to Praxis' newest offshoot, PRX, which will be launched soon. To learn more just email the Praxis managing editor, Thomas Spitzer-Hanks, and please - help us make this project a success. We are also soliciting blog posts on the subject of data, from practical how-tos to theoretical and political ruminations on life in the zettabyte era, and we'll be sharing the stories of some of the people we're watching who are doing exciting things with data. If you have data for us, or thoughts to share on data, or you know someone who talks about bigdata (whether ironically or not), drop us a line.
In the meantime, try to actually notice the next time you give away your data, and ask yourself this: to whom, and for what purpose, did I just do that? In a current world of gigantic data breaches preceding tomorrow's world of the internet of things, how are humans going to use data without being reduced to a number or, what may be worse, forgotten altogether?