November 4 WORD LAB – “Retranslating Musical Comedy for Shanghai’s Left-wing Film Movement”

Maddie Wilcox presented about her project “Retranslating Musical Comedy for Shanghai’s Left-wing Film Movement.”  She framed her problem – a question of naming a genre – and told us about how she explored and began solving it.

A filmmaker named Yuan Muzhi in the 1930s called for a new kind of film to be made in China – a term yinyue xiju, which translated directly “musical comedies.”  However, the kinds of film he was advocating for were not what one would classify as musical comedies based on Western definitions – a genre of films that were part of the foreign and homegrown film scene in China at the time. This genre is often translated as “sing-song pictures” (gechang jupian).

To begin exploring this question of genre names and translation, Maddie used the Shanghai Library database and the Media History Digital Library to explore date distribution of mentions of different genre names using full text searching.  This ultimately led her to explore the term “operetta” as a better translation for Yuan Muzhi’s idea of musical comedy.

In the Q&A, we discussed different models for exploring the language around these films, including topic modeling and keywords in context.  Ultimately, her research questions may be best served by moving between methods, which allow her to surface terms that she does not know about and then understand how and when they are being used.

We ended by exploring how the digital corpora she used were made and OCRed (a prelude to our discussion this past week).  We discussed the lack of digitizing or providing metadata for advertisements.  We examined exactly what was OCRed in the PDFs from the library, using a few methods.

In a lively series of experiments, using different tools on several machines (and the expertise of our Chinese readers), we uncovered what parts of the text had been OCRed, the quality of the OCR, and its arrangement.  It was, not surprisingly, dirty; however, we were surprised at how much of the text appeared to not be OCRed at all.  While it seemed that her results were still helpfully suggestive, we discussed what a difference having a fully OCRed text would be for keyword search.

Leave a Reply

Your email address will not be published. Required fields are marked *