Computational Linguistics for Literature

Andreas van Cranenburgh represented The Riddle at The North-American Chapter of the Association for Computational Linguistics (NAACL, June 3–8 2012, Montreal). He writes:
In the main conference, the first paper that caught my eye was one about a
task called “multiple narrative disentanglement.” This is simply the problem
of recognizing in running text the different narratives with their own sets of
characters and storyline. A method is introduced which is applied to the
famously complex novel ‘Infinite Jest’ by David Foster-Wallace.

There was also an interesting invited talk by James W. Pennebaker, a social
psychologist, who reported that it is possible to recognize a writer’s gender or
emotional state (such as whether someone’s depressed) by counting simple works
like `the,’ `because’, and `I’ (function words).

Another application of stylometry (measurements intended to judge the style of
texts) was in a poster on scientific articles. Coincidentally, this work also
introduces the use of syntactic fragments, although in a different way as in my
own work, the difference is that this work uses a
classifier that is trained to find the best weights for its features, so the
fragments and other features need to be known in advance. The aim of the work
is to guess the gender of the author of a paper, whether they are a native
speaker, and whether the paper is presented in a conference or a workshop.

Finally, the most interesting event with respect to literature and other
digital humanities was the workshop “Computational Linguistics for
Literature,” in which I presented my work on authorship attribution with
syntactic fragments. Although authorship attribution is a well-established task
with good results (which typically work by counting function words as in
Pennebaker’s talk), the idea to look at the full syntax trees and allow any
recurring pattern thereof to play a role is new. The poster generated useful
feedback and I will continue to work on this method to better understand how it
works and what else could be achieved with it. Other interesting papers in the
workshop analyzed the different voices in the poem The Waste Land by T.S.
Elliot and looked into the different between professional and amateur poetry.

