Prehistory of The Riddle

In 2014, the editors of the Utrecht journal for literary studies Vooys:asked project leader Karina van Dalen-Oskam for a short article about the background of the project The Riddle of Literary Quality.. The editors of Vooys gave permission to post an English translation of this article on the Riddle blog. The original publication is titled ‘The Riddle of Literary Quality. Op zoek naar conventies van literariteit’ and was published in: Vooys: tijdschrift voor letteren 32 (2014), 3, p. 25-33.

The Riddle of Literary Quality: The search for conventions of literariness

Karina van Dalen-Oskam

There was quite a stir when the launch of the project ‘The Riddle of Literary Quality’ was announced, on 7 July 2011. The aim of this research project is to investigate literary quality using digital methods. Discussions arose between various linguistic and literary specialists, and the usual criticisms were voiced. In spite of all this we remain as curious as ever about this project. The research has been underway for more than eighteen months, and the first results are starting to emerge. The editor of Vooys asked Karina van Dalen-Oskam, research director of the much-discussed project, to comment on the organisation, background and first results of ‘The Riddle’.


An overview of the project

In the computational humanities project entitled ‘The Riddle of Literary Quality’ (or ‘The Riddle’ for short) we are investigating the formal differences between Dutch novels or novels that have been translated into Dutch that readers consider to be high literature, and those novels that have not been given this quality stamp. We are also looking at whether we can identify formal differences between novels that readers judge to be ‘good’ or ‘bad’. By formal qualities, we mean the textual qualities or textual attributes that can be discerned in low-level and high-level patterns. The hypothesis we would like to test is the suggestion that correlations exist between formal qualities and readers’ opinions. These correlations cannot be discerned with the naked eye, but might become visible with the aid of a computer. For example, could it be true that books described as ‘literary’ by today’s readers have a more complex sentence structure on average? And if so, what other attributes do such books often have? Do novels that are considered to be ‘not very literary’ contain a smaller average number of different words? Do differences in the size of a person’s vocabulary, for instance, tend to coincide with differences in the use of long, usually difficult words in novels that have been rated differently? In other words, it is never about assessing a single quality or aspect at a time, but rather about frequent combinations: patterns of qualities that occur together. We can only find these by combining large amounts of data.

We define low-level patterns as qualities that are easy to measure using a computer, such as the size of the vocabulary, average sentence length, the frequency of certain words, and so forth. High-level patterns are somewhat more challenging from a computational perspective, because they require a number of technical intermediary steps, making them more difficult to analyse. It would be impossible for us to address all high-level patterns in this project; for reasons of sheer necessity, for instance, making aspects such as theme or narrative structure automatically discernible remains beyond our scope for the time being. We decided to work on syntactic complexity on the basis of previous research by Rens Bod and colleagues at the University of Amsterdam (UvA). In addition, there are also some qualities that fall between the ‘high’ and ‘low’ levels, such as word class frequencies. We have attempted to include a few of these in our analysis. Is there a difference, for example, in the use of adjectives and adverbs in novels that are considered ‘literary’ and novels from genres that are not normally given this predicate, such as thrillers or chick lit? And with what other qualities may word class frequencies be connected? It is not humanly possible to answer questions such as these, as one would have to read and keep a tally at the same time. With the current state of computer technology, however, it is possible to tackle this, and the answers may well open up new avenues of research.

We are analysing formal textual qualities using software that we have adapted to our specific objectives, or that we wrote ourselves. We apply this software to the digital files of the novels. This is possible because the e-book has also taken off in the Netherlands in recent years, and most of the publishers of the novels we wish to analyse assisted us by giving us access to the digital files of the novels. As we want to make our software available for everyone to use at the end of the project, other researchers will be able to repeat and verify our analyses. However, we are unable to provide access to the files of the novels.

We gathered readers’ opinions of the novels that we wanted to analyse by conducting a large online survey, the National Readers’ Survey (Het Nationale Lezersonderzoek). Thus we do not indicate which books we consider to have a ‘high’ or ‘low’ level of literary quality, or whether we consider a book to be ‘good’ or ‘bad’ (or somewhere in between). Instead, we are using the opinions of the almost 14,000 people who participated in our survey. At the end of the project, we will also release the data collected in the survey, so that other researchers can verify our account of the results and the data can also be used for other research.

Thus, in principle, other researchers will be able to repeat and verify all our measurements for accuracy. Obviously, there will be differences of opinion regarding our interpretations of the measurements, of both the computational text analysis and the survey. As the underlying data were produced in a transparent way, however, it will be possible to have fruitful discussions about various interpretations.


Background and research context

A project such as The Riddle, uncommon though it may be, cannot simply appear out of the blue. In fact, the project was born from a combination of developments in various fields as well as my own background.

After graduating in Dutch at Utrecht University (specialising in Middle Dutch Literature), between 1988 and 2002 I worked for the Institute for Dutch Lexicology (Instituut voor Nederlandse Lexicologie, or INL). There, I was one of the editors of a dictionary of Early Middle Dutch, the Vroegmiddelnederlands woordenboek (1200-1300). This was one of the first dictionaries in the world to be compiled making intensive use of the opportunities offered by new computer technologies throughout the process. The Vroegmiddelnederlands woordenboek was based on Dutch textual material from all surviving documents that could be dated with certainty to the thirteenth century. The majority of these texts, which included literary, official and lexicographical works, had been published in Maurits Gysseling’s monumental work, Corpus Gysseling. Every single word in this corpus had to be described in our dictionary. Texts that had only been included in later manuscripts were carefully excluded from the corpus. In this way, the dictionary and the corpus on which it was based formed a solid foundation for linguistic and lexicographical comparisons with texts dating from later periods, or texts that could not be dated with certainty to the thirteenth century.

This rigorous empirical approach was in line with international developments in lexicography (the compilation of dictionaries) and linguistics, and can be described as ‘corpus lexicography’ and ‘corpus linguistics’. At the time, corpus analyses of literature was not a discussion theme yet, but with my background in literature I was prepared to give it some serious thought. This empirical approach, using software to analyse every occurrence of a word as part of a coherent body and describe it in a dictionary entry, inspired me to consider the opportunities this approach might present for literary research. My own initial experiments in this direction involved an analysis of all proper names in Jacob van Maerlant’s Rijmbijbel (1271), one of the texts in the thirteenth-century corpus. I listed those names that occurred in the Rijmbijbel as well as those originating from the most important Medieval Latin source text. This allowed me to identify a number of names in text fragments that Maerlant proved to have based on other sources. I subsequently attempted to find out why these passages had been added.

The dictionary had to be finished within ten years, because it had initially been thought that the use of computers would speed up the process considerably. In practice, this proved not to be the case. Although using computers allowed us to analyse a much larger quantity of data than before, making the dictionary entries far more comprehensive than had been possible before, the time it took to perform the analysis was not shortened. In other words, using computers hardly accelerated the process, but did result in a significant quality improvement of the final product. In addition, adopting a rigorous empirical approach such as this was only possible by making use of the latest technological advances.

In 2002, when the dictionary was finally completed, I left the INL to work for the Royal Netherlands Academy of Arts and Sciences (Koninklijke Nederlandse Akademie van Wetenschappen, or KNAW): first at the Netherlands Institute for Scientific Information Services (Nederlands Instituut voor Wetenschappelijke Informatiediensten, or NIWI), and then from 2005 at the Huygens Institute, and (from 2011) at Huygens ING. One of my tasks was to set up a research programme on the Digital Humanities. I was able to make an in-depth study of what had been achieved to date in the Digital Humanities (then known as Humanities Computing) in the area of literary studies. Two journals played an important role in this effort: Computers and the humanities and Literary and linguistic computing. I was disappointed by what I found. Most of the articles on works of literature described methods that were intended to help identify the unknown or disputed author of a work. The methods employed tended to be hard to fathom, not only as a result of the mathematical and statistical explanations included in the articles, but also because they focused exclusively on producing a yes/no answer. And that is where it would end, whereas for me this was precisely where it got interesting. I wanted to know how the authors being investigated differed from each other, to be able to take this as a starting point for further analyses. The methods used, however, failed to suit my purpose.

This situation began to change when John Burrows launched his ‘Delta procedure’ in 2002, which was also a method for determining the likely author of a disputed text. As a basis, he took the 150 words that occurred most frequently in a corpus that contained the disputed text, together with a group of texts by authors who were being considered for authorship. ‘Now we’re talking’, I thought; ‘words, I can do something with them’. Burrows calculated the Delta score, which expresses the difference between the texts in the corpus, on the basis of the average frequency of each of the 150 most frequently occurring words and the sum of the deviations from these word frequencies compared to their prevalence in the entire corpus (I will not discuss statistical details here). The text with the Delta score that is closest to the score for that of the disputed text is most likely to have been written by the author responsible for the disputed text. By applying the procedure to texts for which authorship is certain, Burrows and other researchers were able to show that this method was highly successful.

The words that occur most frequently in any text tend to be function words, such as I, she, he, and, the, in, about, and so forth. Substantive words (including nouns) tend to occur in much lower frequencies than function words. It intrigued me that it was possible to distinguish between authors on the basis of the frequency patterns in their use of words that were, after all, extremely common and not related to content. The finding that the method also shows which words are involved provides actual pointers allowing us to move on to the interpretation step: how do authors differ from one another? How do they use certain linguistic elements to shape their work? Whether this is a conscious or subconscious process is irrelevant.

The methods used to identify authors not only succeeded in distinguishing effectively between authors, but also between genres. And one of the questions that I (among others) was soon keen to answer was: can literary novels also be seen as a distinct genre, and would we then be able to measure a difference between high-level and low-level literature? Max Louwerse (now Professor of Cognitive Psychology and Artificial Intelligence at Tilburg University) referred me to a collection edited by Willie Van Peer and published in 2008, The quality of literature. Linguistic studies in literary evaluation. In this collection, a variety of researchers present an extremely diverse range of ways in which to distinguish ‘high’ literature demonstrably from other forms of literature. Although all contributing authors presented useful experiments, few of those might lead to promising follow-up research, I thought. The contribution I found most useful was the last in the collection: Renate von Heydebrand and Simone Winko’s ‘The qualities of literatures. A concept of literary evaluation in pluralistic societies’ (pp. 223-239). In this article, they summarise and build upon the work set out in their book, Einführung in die Wertung von Literatur. Systematik – Geschichte – Legitimation (1996). The article in The quality of literature describes their model of analysis, based on previous research into value assignment and the canon, notably on the outcomes of empirical research of ‘social psychology and the psychology of cognition’. In the abstract, they write:

We argue that the evaluation of literature has to be considered in social terms, not merely as an individual act. Our model is designed to facilitate the analysis of evaluation. Its advantage, in our view, lies in abandoning the notion of literary quality as a property intrinsic to the text, without denying that there have to be textual properties corresponding to the value expectations which people bring to literature. It also provides a basis for a pluralistic evaluation of literature, going beyond the convention of aesthetic autonomy and taking into account the entire spectrum of social functions associated with literature’ (p. 122, italics added).

The italicised parts refer to the aspect that I would like to research one day, based on developments in the digital humanities and without losing sight of sociological aspects.

An opportunity to undertake such research in practice presented itself when the KNAW made funding available for a number of projects in the context of what it called the Computational Humanities Programme. The condition was that each project should involve collaboration between at least two KNAW institutes and at least one university partner. The university partner had to have a proven record in Natural Language Processing, Artificial Intelligence or Computational Sciences. The theme of the call for proposals, in short, was the development of innovative tools to identify ‘high-level patterns’ in order to answer a research question in the humanities. The proposal for The Riddle of Literary Quality, the main outlines of which were given at the beginning of this article, was approved along with three other proposals.

The Riddle is a collaborative project carried out by Huygens ING (the Huygens Institute for the History of the Netherlands), a KNAW institute; the Fryske Akademy (the FA, also part of the KNAW), in the person of Hanno Brand (now director of the FA); and the Institute for Logic, Language and Computation (ILLC) at the UvA, in the person of Rens Bod, Professor of Digital and Computation Humanities at the UvA. Three PhD students are also working on The Riddle: Andreas van Cranenburgh, Corina Koolen, and Kim Jautze, along with a software developer, Hayco de Jong. Moreover, the project brings together a think-tank of national and international specialists in the most important project areas. The project began in January 2012.


Current developments

We are now approximately halfway through the project. What have we achieved to date? We focused our activities in three areas: setting up and carrying out the National Readers’ Survey; developing the initial versions of a number of analytical tools, including testing these on a small corpus of novels; and thinking about how to make the tools we are developing available to other researchers most effectively. To begin with the last point: we feel that it is essential that other researchers should be able to verify and repeat our research results without having to learn how to program. This means we want to make our tools available through an accessible interface, and in a technical environment that anyone will find easy to access or install. We are working hard to make this happen, together with international digital humanities researchers in computational stylistics. I will not discuss the details here.

The results of the first pilot on aspects of textual analysis were published in an article entitled ‘From high heels to weed attics: a syntactic investigation of chick lit and literature’. In this article, the three PhD students and the software developer who are working on The Riddle, Kim Jautze, Corina Koolen, Andreas van Cranenburgh and Hayco de Jong, analyse a number of aspects of sentence structures in a small corpus of chick-lit novels and novels that can be considered ‘literary’ on the basis of literary awards. By applying specialised software, they show that there are statistically significant differences between the two genres. The chick-lit novels from the corpus contain more compound sentences than the literary novels, while the literary novels have more complex sentences. Furthermore, they found various indications that the language used in the chick-lit novels – in dialogues, for example – is closer to everyday language, whereas the literary novels feature more descriptive language. These results will be elaborated on in subsequent research. In a talk at the most recent ‘Achter de verhalen’ congress (26-28 March 2013 in Brussels), we combined the research into syntactic qualities with experiments that included a number of other textual qualities, including register and word frequencies. We hope to publish the results shortly.

Much work also went into setting up the readers’ opinions survey. We called in the assistance of a specialist in market research, the economic psychologist Erica Nagelhout of Nagelhout MRS. With her help we set up the National Readers’ Survey, which ran from 4 March until 27 September 2013. We asked our respondents to supply some personal details (age, gender, postcode, level of education). In addition, we wanted to know how many books they read on average each year, and whether they read only fiction, only non-fiction, or both. We also asked them to respond to a number of statements, which were intended to help us establish whether we were dealing with readers who (based on the description by Von Heydebrand and Winko) primarily assume a heteronymous reading role, reading for pleasure and considering it important to be able to identify with a character, for example, or whether they primarily assume an autonomous reading role, in which they mainly read for the aesthetic experience. In our view, it is not inconceivable that readers with different reading roles would evaluate the same book very differently. The answers to these questions will allow us to test this hypothesis in practice.

The core of the survey consisted of a list of 400 novels. First we asked respondents to tick off which of the novels they had read. For seven of the books they had ticked, they could then state how ‘literary’ and how ‘good’ (on a scale of 1 (the least) to 7 (the most)) they rated them (and then another seven books, if they so wished, etc.). In an open-ended section, we asked them to explain their opinion of one of the books they had rated. After this, we gave them the chance to give scores on the two scales for a maximum of seven books that they had not read.

As we need as many opinions as possible on each book for our research to make statistically relevant observations, we decided to compile a list of a maximum of 400 novels ourselves. If everyone had been allowed to come up with their own titles, it is likely that many more books would have been cited, but only a few books would have attracted enough ratings to perform reliable statistical analyses. We wanted to draw up the list of works in the most unbiased way possible. In any case, the respondents would indicate which works they considered to be ‘literary’. We did not want to do this ourselves, as otherwise the results would have been based on our own assumptions, and our conclusions would not have been convincing. We decided to base the list of books on sales and library lending figures for the three years prior to the survey, because we thought this would result in the most responses from readers. We also decided that books included in the list should not have been published for the first time more than five years previously (unfortunately a few older titles did slip through). This is because we wanted to avoid respondents saying that such and such a book had been on their list at school, so it must be literature (in a few cases, we did encounter remarks from respondents who made such claims, for example in relation to Het diner by Herman Koch).

The criteria for drawing up the list of novels for the survey also led logically to the establishment of the corpus of novels that we want to analyse with the software under development. If, searching for correlations, we intend to combine textual, linguistic qualities with readers’ opinions, then all opinions and textual qualities must relate to the same books. We are in the process of completing our corpus as much as possible, so that we can combine text analyses and opinions in the way conceived in the second half of the project.

We are very much aware of the limitations of our research. As we are basing our research on recent novels and contemporary opinions of a group of respondents with a greater than average affinity with reading and literature (more on this later), our observations and interpretations refer exclusively to conventions of literariness for the respondents at this moment in time. From a methodological perspective, however, we do not consider this to be a disadvantage. Indeed, it allows us to create a point of reference against which we can compare the results of future research into other language areas and other time periods. In our project, we hope to be able to do this experimentally for modern Frisian literature and for American and British novels around 1900. In subsequent projects, we hope to build further on the results of these experiments and on the project as a whole, in order to gain more insight into the forming of the canon, for example, and the development of stylistic aspects as an expression of conventions of literariness over space and time.

In addition, we also expect to be able to make many socio-literary observations, and we believe we will gain more insight into the combined impact of sociological and textual factors on conventions of literariness. Thus far from denying the great importance of sociological factors, we feel that the time is right and the technology is there to investigate which textual factors shape opinions about literature.

On the website of the National Readers’ Survey, we have published a number of results of the survey. More than 70 per cent of the respondents are highly educated, most of them women aged 50 or older. This is largely consistent with the findings of other reading surveys. For this reason, we believe the survey can give us an excellent insight into the conventions of literariness that predominate among today’s highly educated reading public. Here I cite a few examples from the results; for more, I refer the reader to the website. Fifty Shades of Grey by E.L. James gets a very poor rating indeed, and those who did not read the book gave it an even lower rating than those who did. This appears to be a trend for many books, with the exception of those books that are seen as the most ‘literary’, which may well be a useful starting point for further research. The top-ten list of ‘best’ books is almost identical to the top-ten list of ‘most literary’ books, with a few exceptions. Both lists contain only male authors. Female authors account for the entire top-ten of ‘least literary’ books. Women dominate the list of top-ten ‘worst’ books, but they are found side by side with James Worthy and Kluun. An initial assessment of the appraisal of a number of titles suggests that the respondents consider translated books in a particular genre to be slightly better than the original Dutch novels in the same genre. These are all very intriguing observations, and we are currently investigating them in more depth.

In The Riddle, all project team members are working on answering the main research questions. The three PhD students also have their own topics, however, working on them within broader frameworks for their PhD theses. Andreas van Cranenburgh is working in the area of computer linguistics, focusing on research into the analysis of sentence structure and patterns. He is also developing a measure of syntactic complexity. Corina Koolen is using her expertise in literary studies and computer linguistics to identify differences in the use and function of descriptions of the physical attributes of characters in novels rated differently by the survey respondents. Kim Jautze is combining approaches from linguistic and literary studies in stylistic research into the literary thriller. The project members frequently present their work at conferences. For example, we were represented with two papers at the first annual DHBenelux digital humanities conference, which took place in June 2014 in The Hague. We presented two long papers and a poster at the global digital humanities congress DH2014 in Lausanne. Updates on new developments can be found on our project website. We found that many people, both in the Netherlands and abroad, are closely following our project. Which tells us we are not the only ones eager to see the end results.



Website of the Riddle of Literary Quality project:
Website of the National Readers’ Survey:
Website DH2014, Lausanne:
Website DHBenelux:
Webpage of Andreas van Cranenburgh:
Blog by Corina Koolen:
Blog by Kim Jautze:



Kim Jautze, Corina Koolen, Andreas van Cranenburgh and Hayco de Jong. ‘From high heels to weed attics: a syntactic investigation of chick lit and literature’. In: Proceedings of the Workshop on Computational Linguistics for Literature (co-organised with North Atlantic Association of Computational Linguistics (NAACL), Atlanta (US), 10-14 June 2013), pp. 72-81.

Kim Jautze, Karina van Dalen-Oskam, Erica Nagelhout, Hayco de Jong, Corina Koolen, GertJan Filarksi and The Riddle of Literary Quality Team. ‘The development of a large online survey of readers’ opinions’ (under review).

Kim Jautze, Corina Koolen, Andreas van Cranenburgh and Karina van Dalen-Oskam. ‘Meetbare genrekenmerken. The Riddle of Literary Quality’ (in process).

Willie Van Peer (ed.). The quality of literature. Linguistic studies in literary evaluation. Amsterdam/Philadelphia: John Benjamins, c. 2008 (Linguistic approaches to literature (LAL), Volume 4).

Renate von Heydebrand and Simone Winko. Einführung in die Wertung von Literatur. Systematik – Geschichte – Legitimation. Paderborn etc.: Ferdinand Schoeningh, 1996.

W.J.J. Pijnenburg et al. Vroegmiddelnederlands Woordenboek. Woordenboek van het Nederlands van de dertiende eeuw in hoofdzaak op basis van het Corpus-Gysseling. Leiden/Groningen: Gopher Publishers, 2001. Four volumes. More information at Available online at

Comments are closed.