October 29 OPEN LAB

Our OPEN LAB time on October 29 was split into two parts: discussion of an article on authorship attribution, and step one of reading a CSV file, calling a web API, and then rewriting the file in Python. You can find the article here:


Ayaka Uesaka and Masakatsu Murakami

Verifying the authorship of Saikaku Ihara’s work in early modern Japanese literature; a quantitative approach Literary & Linguistic Computing, first published online September 29, 2014 doi:10.1093/llc/fqu049 (9 pages)

Our discussion centered on first the article’s assumptions and methodology, and then authorship attribution and its role in general. One point of contention is that the article attempted to differentiate one epistolary work from the rest of an assumed body of Ihara Saikaku’s works, but did not take into account the major stylistic differences between epistolary writing and the writing of typical fiction during the Edo period (1600-1868) in Japan. Thus, the stylistic differences that led the authors to suspect that Saikaku did not write this particular work could also simply just be due to the difference in genre. We thought that more work on genre could be a productive and interesting direction for this kind of research.

The Python tutorial covered reading in a CSV file using the unicodecsv library, how to import libraries in general, and how to access items in a list. It also demonstrated how to construct a URL and call the Chinese Biographical Database web API through urlopen(). Stay tuned for reading data from the API at the next OPEN LAB.

October 14 OPEN LAB

Laura Gibson, from the Annenberg School of Communication, led our discussion of “The Battle for ‘Trayvon Martin’: Mapping a Media Controversy Online and Offline.”  The authors collected a range of media with mentions of “Trayvon Martin” (and the common misspelling “Treyvon Martin”) from twitter, blogs, online media outlets, newspapers, and television in order to understand the media ecosystem around the killing. They used MIT’s Media Cloud to produce much of their evidence. We were especially interested how they created the data set and how other scholars might use their infrastructure. Laura’s research group’s is currently working with them. We discussed the promising avenues of sharing mined data — in this case, the identification of certain newspaper articles — and the  legal complications of actually gathering the text for subsequent research.  We also examined the various methods and tools the researchers used to analyze their evidence.

In the last part of the session, Molly led a fabulous Python tutorial based on Brian Vivier presentation the previous week.  We walked through object types, ways of easily creating paths for an API, and frameworks for querying and organizing imported CSVs.

Erhardt Graeff, Matt Stempeck, and Ethan Zuckerman, “The battle for ‘Trayvon Martin’: Mapping a media controversy online and off–line,” First Monday, vol. 19: 2 – 3, February 2014