On Tuesday, we discussed “Digging for Gold with a Simple Tool: Validating Text Mining in Studying Electronic Word-of-Mouth (eWOM) Communication,” by Chuanyi Tang and Lin Guo (2013).
This article tackles the problem of text mining from a marketing perspective, testing whether text mining offers useful information in the study of eWOM (electronic word-of-mouth, aka online reviews). Tang and Guo conclude that while the Star Rating of an online review is the best predictor of people’s attitudes in the review, text mining can offer additional nuance.
Much of our conversation centered around the LIWC software used by Tang and Guo for their study. Essentially an amped-up text tagger, LIWC checks each word in a text against its range of dictionaries and produces a statistical breakdown of that text.
LIWC’s main strength seems to be its dictionaries, which are thoroughly-researched and allow for somewhat sophisticated tagging of words by a range of features: parts of speech, emotions (positive or negative), and many categories including “Body,” “Ingestion,” “Time,” “Money,” “Religion”–over 400 different categories in all. In the case of online reviews, for example, Tang and Guo found that “Negations” and “Money” were both effective predictors. These dictionaries are, however, proprietary, and Christine pointed out the difficulty of accessing these full dictionaries in the latest version of the LIWC software.
We tested LIWC on a segmented version of Dickens’s David Copperfield, as a good example of a coming-of-age story, but weren’t able to find strong trends. The whole paper was a very interesting counterpoint to previous work we’ve discussed on text mining in the humanities, where it’s not always so easy to validate the result.
Thanks to Christine Chou for suggesting the piece and taking the time to give us a great overview!