Let’s LIWC at The Riddle from a slightly different perspective…

The Riddle is in its final phase. In the next two years, much of the experiments that have been done will be reported on in one more PhD-thesis, scholarly articles and in a book for all those who enjoy reading and may even have participated in our National Reader Survey in 2013. One of the things still on our to do list was applying stylometric tools with the knowledge we gained concerning the literary conventions of contemporary novels on older fiction. In April 2017, Floor Naber has started a short stint at Huygens ING’s Riddle Team to do an experiment with LIWC. She will use the software Linguistic Inquiry and Word Count on the Riddle corpus and on a corpus of late nineteenth-century Dutch novels to test whether old and new compare or where they differ.

PhD defense Andreas van Cranenburgh

2 November 2016: Andreas van Cranenburgh defends his PhD-thesis Rich statistical parsing and literary language.

This thesis studies parsing and literature with the Data-Oriented Parsing framework, which assumes that chunks of previous experience can be exploited to analyze new sentences. As chunks we consider syntactic tree fragments.

After presenting a method to efficiently extract such fragments from treebanks based on heuristics of re-occurrence, we employ them to develop a multi-lingual statistical parser. We show how a mildly context-sensitive grammar can be employed to produce discontinuous constituents, and compare this to an approximation that stays within the efficiently parsable context-free framework. We show that tree fragments allow the grammar to adequately capture the statistical regularities of non-local relations, without the need for the increased generative capacity of mildly context-sensitive grammar.

The second part investigates what separates literary from other novels. We work with a corpus of novels and a reader survey with ratings of how literary they are perceived to be. The main goal is to find out the extent to which the literary ratings can be predicted from the texts. We first evaluate simple measures such as vocabulary richness, text compressibility, and the number of cliché expressions. In addition we apply more sophisticated, predictive models: a topic model, bag-of-words model, and a model based on syntactic tree fragments. We find that literary ratings are predictable from textual features to a large extent. While it is not possible to infer a causal relation, this result clearly rules out the notion that these value-judgments of literary merit were arbitrary, or predominantly determined by factors beyond the text.

Link to the thesis: http://dare.uva.nl/record/1/543163

Prehistory of The Riddle

In 2014, the editors of the Utrecht journal for literary studies Vooys:asked project leader Karina van Dalen-Oskam for a short article about the background of the project The Riddle of Literary Quality.. The editors of Vooys gave permission to post an English translation of this article on the Riddle blog. The original publication is titled ‘The Riddle of Literary Quality. Op zoek naar conventies van literariteit’ and was published in: Vooys: tijdschrift voor letteren 32 (2014), 3, p. 25-33.

The Riddle of Literary Quality: The search for conventions of literariness

Continue reading

First results of reader survey online

Recently, the first results of the large reading survey that is part of the project The Riddle of Literary Quality were published on the website of Het Nationale Lezersonderzoek. The survey could be filled in from March 4 until September 27 2013. We are very happy that in total 13,782 readers did this.Some of the first results (in Dutch) can be found on the survey site. More will follow. And ofcourse, we are now starting work on the analysis of as many of the 400 novels in the list as is possible, to find out whether we can find any correlations between features of the texts, readers’ opinions, and also readers’ predominant reading role. Results in this area will take some more time, but we are sure that they will yield interesting new insights into what a book needs to be evaluated as literary or not, good or bad, by different kinds of readers.

Launch of The National Readers’ Enquiry

On March 4th 2013, the survey of the project The Riddle of Literary Quality was launched on http://www.hetnationalelezersonderzoek.nl/ . This “National Readers’ Enquiry” hopes to reach many thousands of respondents. As can be expected based on the nature of our project, the language of the survey is Dutch. All readers of Dutch among you are welcome to give your ratings on a set of novels (originals and translations) published during the last five years in The Netherlands. The list of novels contains those that were borrowed most from public libraries and that ranked highests on the bestseller lists of the last three years. Enjoy!

New members of the project team

The Riddle team has recently become a lot larger. PhD-student Corina Koolen was liaised to the project starting in September 2012, project assistant Fernie Maas joined the 1st of November, and Kim Jautze started work on her PhD on 15 November. More news is sure to follow, the team now being complete!

Computational Linguistics for Literature

Andreas van Cranenburgh represented The Riddle at The North-American Chapter of the Association for Computational Linguistics (NAACL, June 3–8 2012, Montreal). He writes:
In the main conference, the first paper that caught my eye was one about a
task called “multiple narrative disentanglement.” This is simply the problem
of recognizing in running text the different narratives with their own sets of
characters and storyline. A method is introduced which is applied to the
famously complex novel ‘Infinite Jest’ by David Foster-Wallace.
Continue reading

EACL 2012, Avignon

Last week, Andreas van Cranenburgh attended the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL). EACL is one of the foremost conferences in the field of computational linguistics; this year’s acceptance rate was well below 30%. There were only 5 other works about parsing, and all of these were about dependency parsing instead of constituency parsing (different representations for expressing the syntactic structure of sentences). This meant that Andreas’s poster on discontinuous parsing was the only one to focus on constituency structures, which are commonly used in Data-Oriented Parsing and thus relevant to our project.

One paper in particular stood out due to its relevance to our project: Character-based kernels for novelistic plot structure. The paper presented a method to analyze and compare plot structure of novels. For example the relations of characters in a social network can be extracted, as well as their `emotional development’ based on a list of emotion-related words. The resulting information is used to produce a similarity metric for texts. One graph, for example, plotted the emotions of the protagonist of a Jane Austen novel, showing strong peaks corresponding to a proposal, elopement, and marriage of the protagonist. It is encouraging to see that even with a relatively superficial linguistic analysis, interesting details can be revealed of literary texts.

Parsing in Avignon

Riddle PhD-student Andreas van Cranenburgh will present a poster at the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL). The conference will be held at the University of Avignon from April 23 to April 27, 2012. The title of the paper is ‘Efficient parsing with linear context-free rewriting systems.’ It presents results on parsing with discontinuous constituents from his Master thesis, defended in October 2011. More about the conference can be found through http://www.eacl.org/ A pre-print of the paper is available at http://staff.science.uva.nl/~acranenb/eacl2012.pdf

A dark and stormy night

Most people will recognize the phrase “It was a dark and stormy night” from the Peanuts comics, with Snoopy typing this sentence on his typewriter. Snoopy’s creator Charles M. Schulz, however, referred to a famous first sentence of a nineteenth-century novel written by Edward Bulwer-Lytton. The sentence even has its own Wikipedia page. The sentence is famous because it is seen as the ultimate example of what is often considered a bad style of writing. And this in itself contradictory situation makes it a perfect illustration for the project The Riddle of Literary Quality. The illustration was designed by communication expert Johan Kwantes.