<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Literary Quality</title>
	<atom:link href="http://literaryquality.huygens.knaw.nl/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://literaryquality.huygens.knaw.nl</link>
	<description></description>
	<lastBuildDate>Thu, 03 May 2012 09:05:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>EACL 2012, Avignon</title>
		<link>http://literaryquality.huygens.knaw.nl/?p=266</link>
		<comments>http://literaryquality.huygens.knaw.nl/?p=266#comments</comments>
		<pubDate>Thu, 03 May 2012 08:47:00 +0000</pubDate>
		<dc:creator>KDO</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://literaryquality.huygens.knaw.nl/?p=266</guid>
		<description><![CDATA[Last week, Andreas van Cranenburgh attended the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL). EACL is one of the foremost conferences in the field of computational linguistics; this year&#8217;s acceptance rate was well below &#8230; <a href="http://literaryquality.huygens.knaw.nl/?p=266">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Last week, Andreas van Cranenburgh attended the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL). EACL is one of the foremost conferences in the field of computational linguistics; this year&#8217;s acceptance rate was well below 30%. There were only 5 other works about parsing, and all of these were about dependency parsing instead of constituency parsing (different representations for expressing the syntactic structure of sentences). This meant that Andreas&#8217;s <a title="poster Andreas" href="http://staff.science.uva.nl/~acranenb/eacl2012poster.pdf" target="_blank">poster on discontinuous parsing</a> was the only one to focus on constituency structures, which are commonly used in Data-Oriented Parsing and thus relevant to our project.</p>
<p>One paper in particular stood out due to its relevance to our project: <a href="http://aclweb.org/anthology/E/E12/E12-1065.pdf" target="_blank">Character-based kernels for novelistic plot structure</a>. The paper presented a method to analyze and compare plot structure of novels. For example the relations of characters in a social network can be extracted, as well as their `emotional development&#8217; based on a list of emotion-related words. The resulting information is used to produce a similarity metric for texts. One graph, for example, plotted the emotions of the protagonist of a Jane Austen novel, showing strong peaks corresponding to a proposal, elopement, and marriage of the protagonist. It is encouraging to see that even with a relatively superficial linguistic analysis, interesting details can be revealed of literary texts.</p>
]]></content:encoded>
			<wfw:commentRss>http://literaryquality.huygens.knaw.nl/?feed=rss2&#038;p=266</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Parsing in Avignon</title>
		<link>http://literaryquality.huygens.knaw.nl/?p=263</link>
		<comments>http://literaryquality.huygens.knaw.nl/?p=263#comments</comments>
		<pubDate>Tue, 17 Apr 2012 15:27:29 +0000</pubDate>
		<dc:creator>KDO</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://literaryquality.huygens.knaw.nl/?p=263</guid>
		<description><![CDATA[Riddle PhD-student Andreas van Cranenburgh will present a poster at the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL). The conference will be held at the University of Avignon from April 23 to April 27, &#8230; <a href="http://literaryquality.huygens.knaw.nl/?p=263">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Riddle PhD-student Andreas van Cranenburgh will present a poster at the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL). The conference will be held at the University of Avignon from April 23 to April 27, 2012. The title of the paper is ‘Efficient parsing with linear context-free rewriting systems.’ It presents results on parsing with discontinuous constituents from his Master thesis, defended in October 2011. More about the conference can be found through <a href="http://www.eacl.org/" target="_blank">http://www.eacl.org/</a> A pre-print of the paper is available at <a href="http://staff.science.uva.nl/~acranenb/eacl2012.pdf" target="_blank">http://staff.science.uva.nl/~acranenb/eacl2012.pdf</a></p>
]]></content:encoded>
			<wfw:commentRss>http://literaryquality.huygens.knaw.nl/?feed=rss2&#038;p=263</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A dark and stormy night</title>
		<link>http://literaryquality.huygens.knaw.nl/?p=228</link>
		<comments>http://literaryquality.huygens.knaw.nl/?p=228#comments</comments>
		<pubDate>Mon, 30 Jan 2012 14:08:07 +0000</pubDate>
		<dc:creator>KDO</dc:creator>
				<category><![CDATA[General information]]></category>

		<guid isPermaLink="false">http://literaryquality.huygens.knaw.nl/?p=228</guid>
		<description><![CDATA[Most people will recognize the phrase &#8220;It was a dark and stormy night&#8221; from the Peanuts comics, with Snoopy typing this sentence on his typewriter. Snoopy&#8217;s creator Charles M. Schulz, however, referred to a famous first sentence of a nineteenth-century &#8230; <a href="http://literaryquality.huygens.knaw.nl/?p=228">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Most people will recognize the phrase &#8220;It was a dark and stormy night&#8221; from the Peanuts comics, with Snoopy typing this sentence on his typewriter. Snoopy&#8217;s creator Charles M. Schulz, however, referred to a famous first sentence of a nineteenth-century novel written by Edward Bulwer-Lytton. The sentence even has <a title="Page of stormy night" href="http://en.wikipedia.org/wiki/It_was_a_dark_and_stormy_night">its own Wikipedia page</a>. The sentence is famous because it is seen as the ultimate example of what is often considered a bad style of writing. And this in itself contradictory situation makes it a perfect illustration for the project The Riddle of Literary Quality. The illustration was designed by communication expert Johan Kwantes.</p>
]]></content:encoded>
			<wfw:commentRss>http://literaryquality.huygens.knaw.nl/?feed=rss2&#038;p=228</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Protected: Z</title>
		<link>http://literaryquality.huygens.knaw.nl/?page_id=233</link>
		<comments>http://literaryquality.huygens.knaw.nl/?page_id=233#comments</comments>
		<pubDate>Fri, 27 Jan 2012 09:08:00 +0000</pubDate>
		<dc:creator>KDO</dc:creator>
		
		<guid isPermaLink="false">http://literaryquality.huygens.knaw.nl/?page_id=233</guid>
		<description><![CDATA[There is no excerpt because this is a protected post.]]></description>
			<content:encoded><![CDATA[<form action="http://literaryquality.huygens.knaw.nl/wp-pass.php" method="post">
<p>This post is password protected. To view it please enter your password below:</p>
<p><label for="pwbox-233">Password:<br />
<input name="post_password" id="pwbox-233" type="password" size="20" /></label><br />
<input type="submit" name="Submit" value="Submit" /></p></form>
]]></content:encoded>
			<wfw:commentRss>http://literaryquality.huygens.knaw.nl/?feed=rss2&#038;page_id=233</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Official start of The Riddle</title>
		<link>http://literaryquality.huygens.knaw.nl/?p=223</link>
		<comments>http://literaryquality.huygens.knaw.nl/?p=223#comments</comments>
		<pubDate>Fri, 27 Jan 2012 08:36:40 +0000</pubDate>
		<dc:creator>KDO</dc:creator>
				<category><![CDATA[General information]]></category>

		<guid isPermaLink="false">http://literaryquality.huygens.knaw.nl/?p=223</guid>
		<description><![CDATA[The project The Riddle of Literary Quality officially started at January 15th 2012. The kick-off meeting with all project members and a communication expert was held at Huygens ING in The Hague on Wednesday 18 January 2012. The press release &#8230; <a href="http://literaryquality.huygens.knaw.nl/?p=223">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The project The Riddle of Literary Quality officially started at January 15th 2012. The kick-off meeting with <a href="http://literaryquality.huygens.knaw.nl/?page_id=15" title="Project team">all project members</a> and a communication expert was held at Huygens ING in The Hague on Wednesday 18 January 2012. The press release from the 19th resulted in a couple of <a href="http://literaryquality.huygens.knaw.nl/?page_id=53" title="Reactions in Dutch">reactions in Dutch blogs</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://literaryquality.huygens.knaw.nl/?feed=rss2&#038;p=223</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Riddle of Literary Quality</title>
		<link>http://literaryquality.huygens.knaw.nl/</link>
		<comments>http://literaryquality.huygens.knaw.nl/#comments</comments>
		<pubDate>Tue, 10 Jan 2012 15:39:12 +0000</pubDate>
		<dc:creator>ARL</dc:creator>
		
		<guid isPermaLink="false">http://literaryquality.huygens.knaw.nl/?page_id=179</guid>
		<description><![CDATA[The Riddle of Literary Quality is a research project of the Huygens Institute for the History of the Netherlands in collaboration with the Fryske Akademy and the Institute for Logic, Language and Computation (University of Amsterdam). The Riddle officially started &#8230; <a href="http://literaryquality.huygens.knaw.nl/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><em>The Riddle of Literary Quality</em> is a research project of the <a title="http://www.huygens.knaw.nl/" href="http://www.huygens.knaw.nl/" target="_blank">Huygens Institute for the History of the Netherlands</a> in collaboration with the <a title="http://www.fryske-akademy.nl/" href="http://www.fryske-akademy.nl/" target="_blank">Fryske Akademy</a> and the <a title="http://www.illc.uva.nl/" href="http://www.illc.uva.nl/" target="_blank">Institute for Logic, Language and Computation</a> (University of Amsterdam). <em>The Riddle</em> officially started January 15th 2012 and will run for four years. The project is funded by the <a title="http://ehumanities.nl/projects/" href="http://ehumanities.nl/projects/" target="_blank">Computational Humanities Programme</a> of the <a title="http://www.knaw.nl/" href="http://www.knaw.nl/" target="_blank">Royal Netherlands Academy of Arts and Sciences</a>.</p>
<p>The summary of the project goes below. Elsewhere you can find links to the other parts of the project proposal and to reactions to the project. The links to Dutch language information and reactions are presented in a separate page. Comments can only be added by project team members and members of the think tank.</p>
<p><strong>Summary of the project</strong></p>
<p>Literary quality is one of the most fascinating issues in Literary Studies. Scholars have found that social and cultural factors play an important role in the acceptance of a work as literary or non-literary and as good or bad. In the project “The Riddle of Literary Quality” we assume, however, that formal characteristics of a text may also be of importance in calling a fictional text literary or non-literary, and good or bad – non-literary texts can also be good and literary text can also be bad. Many formal characteristics can be thought of as having a part in this, e.g. the use of difficult words, the number of adjectives and adverbs, or complex syntactic style. This project explores this assumption, integrating the analysis of low-level lexical-statistical features and high-level syntactic and narrative features. The main results that will come out of this project are: (1) a list of formal characteristics and their distribution in a training corpus of differently valued Modern Dutch novels, (2) an evaluation of other Modern Dutch novels based on the results of the training corpus, and (3) results of first experiments of the application of the same measurements on novels from another time period or language. The first two will be described in publications, and the third will take the form of a project plan for a new research program to adapt the tools for diachronic and cross-language application, to make the method applicable to longitudinal research and to the comparison of formal characteristics of literary quality in different languages.</p>
]]></content:encoded>
			<wfw:commentRss>http://literaryquality.huygens.knaw.nl/?feed=rss2&#038;page_id=179</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Reactions</title>
		<link>http://literaryquality.huygens.knaw.nl/?page_id=59</link>
		<comments>http://literaryquality.huygens.knaw.nl/?page_id=59#comments</comments>
		<pubDate>Sun, 04 Sep 2011 13:27:21 +0000</pubDate>
		<dc:creator>KDO</dc:creator>
		
		<guid isPermaLink="false">http://literaryquality.huygens.knaw.nl/?page_id=59</guid>
		<description><![CDATA[Announcement/summary of the project at the site of the eHumanities Group]]></description>
			<content:encoded><![CDATA[<p><a title="http://ehumanities.nl/computational-humanities/" href="http://ehumanities.nl/computational-humanities/">Announcement/summary of the project</a> at the site of the eHumanities Group</p>
]]></content:encoded>
			<wfw:commentRss>http://literaryquality.huygens.knaw.nl/?feed=rss2&#038;page_id=59</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Riddle in Dutch</title>
		<link>http://literaryquality.huygens.knaw.nl/?page_id=53</link>
		<comments>http://literaryquality.huygens.knaw.nl/?page_id=53#comments</comments>
		<pubDate>Sun, 04 Sep 2011 13:04:18 +0000</pubDate>
		<dc:creator>KDO</dc:creator>
		
		<guid isPermaLink="false">http://literaryquality.huygens.knaw.nl/?page_id=53</guid>
		<description><![CDATA[2012, februari De humanities gaan nu ook e. Sally Wyatt in e-Data &#38; research 6 (2012), februari, p. 6 2012, 1 feb.  Interview met Rens Bod in Folia Magazine 19, 01/02/2012 (Folia_p_9) 2012, 23 jan.   Reactie op persbericht en op &#8230; <a href="http://literaryquality.huygens.knaw.nl/?page_id=53">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>2012, februari De humanities gaan nu ook e. <a title="Sally Wyatt over eHumanities programma" href="http://www.edata.nl/0603_010212/pdf/0603_010212_6.pdf">Sally Wyatt in e-Data &amp; research 6 (2012), februari, p. 6</a><br />
2012, 1 feb.  Interview met Rens Bod in Folia Magazine 19, 01/02/2012 (<a href="http://literaryquality.huygens.knaw.nl/?attachment_id=253" rel="attachment wp-att-253">Folia_p_9</a>)<br />
2012, 23 jan.   Reactie op persbericht en op Marc van Oostendorp in <a title="Contrabas" href="http://www.decontrabas.com/de_contrabas/2012/01/wat-is-literatuur-het-antwoord.html">De Contrabas</a><br />
2012, 23 jan.   Reactie op persbericht van <a title="Oostendorp" href="http://nederl.blogspot.com/2012/01/col-een-literaire-succesformule.html">Marc van Oostendorp in Neder-L</a><br />
2012, 19 jan.   <a href="http://www.huygens.knaw.nl/project-the-riddle-of-literary-quality-van-start/#more-3834">Persbericht Huygens ING</a> bij de officiële start van het project<br />
2011, 1 sept.   Reactie op aankondiging door<a title="Stolk" href="http://www.textualscholarship.nl/?p=9306"> Fabian Stolk op Textualscholarship.nl</a><br />
2011, 7 juli   <a title="http://www.textualscholarship.nl/?p=9103" href="http://www.textualscholarship.nl/?p=9103">Aankondiging/samenvatting project</a> op Textualscholarship.nl<br />
2011, 7 juli   <a title="http://www.huygens.knaw.nl/huygens-ing-start-onderzoek-naar-literaire-kwaliteit/#more-2605" href="http://www.huygens.knaw.nl/huygens-ing-start-onderzoek-naar-literaire-kwaliteit/#more-2605">Aankondiging/samenvatting project</a> op Huygens ING website</p>
]]></content:encoded>
			<wfw:commentRss>http://literaryquality.huygens.knaw.nl/?feed=rss2&#038;page_id=53</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>News</title>
		<link>http://literaryquality.huygens.knaw.nl/?page_id=51</link>
		<comments>http://literaryquality.huygens.knaw.nl/?page_id=51#comments</comments>
		<pubDate>Sun, 04 Sep 2011 12:56:11 +0000</pubDate>
		<dc:creator>KDO</dc:creator>
		
		<guid isPermaLink="false">http://literaryquality.huygens.knaw.nl/?page_id=51</guid>
		<description><![CDATA[Vacancies]]></description>
			<content:encoded><![CDATA[<p><a title="http://literaryquality.huygens.knaw.nl/?page_id=100" href="http://literaryquality.huygens.knaw.nl/?page_id=100">Vacancies</a></p>
]]></content:encoded>
			<wfw:commentRss>http://literaryquality.huygens.knaw.nl/?feed=rss2&#038;page_id=51</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Proposal</title>
		<link>http://literaryquality.huygens.knaw.nl/?page_id=36</link>
		<comments>http://literaryquality.huygens.knaw.nl/?page_id=36#comments</comments>
		<pubDate>Sun, 04 Sep 2011 12:14:34 +0000</pubDate>
		<dc:creator>KDO</dc:creator>
		
		<guid isPermaLink="false">http://literaryquality.huygens.knaw.nl/?page_id=36</guid>
		<description><![CDATA[Research aims Research into quality of literary texts is extremely unusual (Von Heydebrand &#38; Winko 1996, Albers 2007, Van Peer 2008, Louwerse et al. 2008). The project builds on stylistic research usually related to authorship attribution, but wants to refocus &#8230; <a href="http://literaryquality.huygens.knaw.nl/?page_id=36">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><strong>Research aims</strong></p>
<p>Research into quality of literary texts is extremely unusual (Von Heydebrand &amp; Winko 1996, Albers 2007, Van Peer 2008, Louwerse et al. 2008). The project builds on stylistic research usually related to authorship attribution, but wants to refocus the used methods and techniques for quality discrimination purposes, which is new. The basic assumption underlying this research is that literary quality is not only decided by social and cultural factors, but also by formal characteristics of the texts which are being evaluated. It is the formal part of quality that will be dealt with in this research, applying both low-level and high-level pattern recognition. By low-level patterns we mean all features that are directly observable in texts, such as word type frequency, average sentence length, vocabulary distribution. By high-level patterns we mean all features that are not directly observable in texts, such as syntactic structure, semantic meaning, motifs and narrative structure.</p>
<p>Low-level patterns in literary texts have been studied relatively often in the context of authorship attribution and occasionally also for stylistic research (Hoover 2010). High level pattern recognition has hardly been applied to literary texts, which is why we will focus on a closer description of this method in the proposal. Raghavan, Kovashka and Mooney (Raghavan et al. 2010) have recently shown that both high-level syntactic structure and low-level lexical information are useful in capturing an author’s overall writing style. They used a probabilistic context-free grammar (PCFG) to model syntactic information for the problem of authorship attribution. Therefore it seems promising to further investigate the usefulness of syntactic structure and other high-level patterns for the problem of style and literary quality in general. We propose on the one hand to use PCFGs for this problem, but also an extension of PCFGs that can include richer syntactic context: probabilistic tree-substitution grammars (PTSGs), also known as data-oriented parsing (DOP) models (Bod et al. 2003, Bod 2009). These models operate with productive units that go beyond the limited context of the rules of PCFGs. Instead, the units of PTSGs are subtrees of (in principle) arbitrary size and thus include PCFGs as special cases. Still, there exist efficient algorithms to parse with PTSGs. Bansal and Klein (2010) have recently demonstrated that PTSGs have a large number of advantages over the lexically-insensitive PCFGs. The VICI-group led by Rens Bod at the ILLC of the University of Amsterdam has a long-time expertise with PTSGs (a.o. Scha 1990, Bod 1992, 2009, Sima’an 1996, Zuidema 2006, Sangati et al. 2010). The goal of this project is therefore to explore structural, syntactic and narrative properties of style and literary quality and to integrate these with the existing low-level stylistic features. The first experiments by Raghavan et al. (2010) have shown that success in this area can be achieved, which suggests that extensions towards richer structure, such as tree-substitution grammars, may enhance the prediction of style and quality.</p>
<p>Apart from finding an answer to the literary research question, the main challenge of the proposal is to find out to what degree the analysis of different pattern levels contributes to stylistics in general and to insight into literary quality in particular, and to which other possibilities the results might lead.</p>
<p><strong>Context</strong></p>
<p>The research closely links to the project Authorship Attribution and Stylistics at the Huygens Instituut for the History of the Netherlands (project leader Karina van Dalen-Oskam) and in the research collaboration with 2010 visiting professor to the Huygens Instituut David L. Hoover who agreed to be part of the Think Tank for this project (Hoover 1999, 2010, Van Dalen-Oskam &amp; Van Zundert 2007, 2008, Kestemont &amp; Van Dalen-Oskam 2009).The project fits in the research focus point for literature as formulated in the Fryske Akademy Master Plan. Related research at the Huygens Instituut for the History of the Netherlands is done by Peter Boot, who focuses on studying the social processes of validation of literature in the preparation of a large grant proposal on this topic.</p>
<p><strong>Research Plan</strong></p>
<p>The project The Riddle of Literary Quality will research whether there are similarities in formal characteristics of texts considered to be &#8216;literary&#8217; or &#8216;non-literary&#8217;. For a long time, the consensus among literary scholars was that literary quality was attributed in social communication. Recently, however, wanting to extend the cultural and historical explanations, they have taken the question of additional formal features into account again (Harris 1995, McDonald 2007, Vaessens 2009). The general audience usually equates &#8216;literary&#8217; with &#8216;good&#8217; and &#8216;non-literary&#8217; with &#8216;bad&#8217;. Literary researchers are of a different opinion: they would rather make a distinction between types of readers or reader roles, arguing that the kind of texts read by these readers also come in categories of &#8216;good&#8217; and &#8216;bad&#8217; and anything in between (and that readers may change roles whenever they like). The distinction used in this project is the one drawn up by Von Heydebrand &amp; Winko (1996, the most recent and a very thorough monograph on literary evaluation) between two types of reader roles: autonomous, when a reader focuses on formal and aesthetical characteristics of the text, and heteronomous, when a reader focuses on the content and relates this to his or her own personal experiences. Based on the analysis of a reader survey which will be conducted in the first phase of the project a training corpus will be created. This corpus will consist of texts clearly preferred by autonomous readers and texts clearly preferred by heteronomous readers. The texts will also be categorized as being highly preferred or least preferred in comparison to other texts in the same group.</p>
<p>In the analysis of formal elements of the fictional texts in the training corpus, the expectation is that there are characteristics which will show up more in texts preferred by autonomous readers than in texts preferred by heteronomous readers (or the other way round). This result would agree with the idea of the general public; it will lead to a list of formal tendencies in &#8216;literary&#8217; or &#8216;non-literary&#8217; texts. There is also another possible outcome, namely that some formal characteristics will turn up significantly more in the most popular of both types of texts and significantly less in the least successful of them (or the other way round). This would favour the opinion of literary scholars about the quality of texts independent of the types of readers. What we expect to find in this project, however, is a combination of these, e.g. a placement of a text anywhere on two axes with as extremest points autonomous &#8211; heteronomous versus high preference &#8211; low preference.</p>
<p>Some predictions are:<br />
-fictional texts preferred by heteronomous readers tend to have a smaller vocabulary than those preferred by autonomous readers.<br />
-fictional texts preferred by autonomous readers tend to have a more complex syntactic structure than those preferred by heteronomous readers.<br />
-Fictional texts that are least preferred by both kinds of readers have a more complex syntactic structure than texts most preferred by both kinds of readers.</p>
<p>It will be clear that this seems to be contradictory as to the role of syntactic complexity. The expectation is, however, that <em>the combination of formal characteristics</em> will be different every time. The co-occurrence of syntactic complexity with high vocabulary richness could therefore be an indication of a text highly preferred by autonomous readers.<br />
The texts in the training corpus are all contemporary Dutch novels. In the second phase of the project, the tools measuring the formal characteristics will be applied to a much larger corpus of Modern Dutch fictional texts and each of the analysed texts will be placed on the two axes mentioned above: autonomous &#8211; heteronomous versus high preference &#8211; low preference. The resulting placements will be analysed; this is expected to lead to a refinement of the tools and a new iteration of measurements, and eventually to a synthesis in publications and hypotheses for further research and development.</p>
<p>One of the main further research questions is clearly whether the formal analysis will be able to help looking backwards to earlier fictional texts and find out more about the formation of and changes in the canon of literary works through time. We expect that the preferences of readers change through time and in different cultural environments. Many of the characteristics to be measured are usually assumed to be language independent. Could changeing preferences be reflected in the formal characteristics of the fictional texts that were being published in the past? Can we make a distinction between preferred and less-preferred, autonomous and heteronomous without being able to conduct a survey of readers from the past? We do not expect to answer all these questions. We do want to map the problems and possibilities based on first experiments and draw up a project plan for follow-up research which will go into diachronic, longitudinal research and cross-language comparability.<br />
We will therefore apply the tools to a set of older Dutch novels as well as on sets of novels written in other languages, limiting the choice of languages for now to two of the closest relatives of Dutch, Frisian and English. We will first make predictions about the placement of the texts to be analysed on the two axes, seen from the point of view of contemporary reviewers (19th-century reviewers of Dutch or American-English novels) and from the point of view of current scholars &#8211; preferably we will select texts of which we know that their evaluation changed over time: much preferred then, now forgotten, or the other way round). By testing this on different languages as well as a different time period we can keep an eye on changing patterns through time and language. This is expected to yield the necessary information to draw up new research plans.</p>
<p><strong>Deliverables</strong>: (1) A list of formal characteristics and their distribution in a training corpus of differently valued Modern Dutch novels (publications); (2) An evaluation of other Modern Dutch novels based on the results of the training corpus (publications); (3) Results of first experiments of the application of the same measurements on novels from another time period and language (project plan for a new research program to adapt the tools for diachronic and cross-language application); (4) Texts (if possible, see the &#8220;Data paragraph&#8221;) and tools available online, including the documentation of the new computational techniques.</p>
<p><strong>Work process</strong>: The tasks in the project will be executed iteratively where next steps are based on previous results, e.g. when it proves that the tool applied needs fine-tuning and re-application. This holds for most of the tasks described below.</p>
<p><strong>Main tasks </strong>(Q = Period of three months)<strong></strong></p>
<p><strong>1.</strong> <strong>Establishing the contents of the Modern Dutch training corpus</strong> (Q1)<br />
The training corpus is of major importance for the success of the project. It will consist of two sets of novels, based on the distinction Von Heydebrand &amp; Winko (1996) make between two types of reader roles: autonomous and heteronomous, and a marking of high or low preference compared to other texts in the same group. The procedure for text selection will be informed by empirical psychology (Bortolussi &amp; Dixon 2003, Bryant &amp; Vorderer 2006) by means of surveys of readers from different backgrounds, making use of online survey possibilities (web based crowd sourcing possibilities will be considered). Readers will be asked to answer many questions which will help us to select texts for the different groups. It is of main importance that the survey will eliminate possible confounds, which is why it is seen as necessary to include an experimental psychologist in the ThinkTank, who will not only guide the preparation of the survey, but also co-author publications on the more wide-ranging results of the survey in publications related to the project.</p>
<p><strong>2.</strong> <strong>Data curation and preparation of the texts of the training corpus</strong> (Q1-2)<br />
The survey mentioned under Task 1 will list a.o. those novels that are already available digitally at Huygens Instituut or as raw OCR at the University of Tilburg where they are being considered for the SoNaR project. SoNaR is a project initiated by the STEVIN program for language and speech technology (<a title="http://lands.let.ru.nl/projects/SoNaR/description.html" href="http://lands.let.ru.nl/projects/SoNaR/description.html">http://lands.let.ru.nl/projects/SoNaR/description.html</a>) and aims to create a 500 million word reference corpus of Modern Dutch written sources. Literary texts will be part of the corpus as well. Martin Reynaert (ILK &#8211; UvT), the coordinator of the work package Corpus Building in SoNaR, has suggested and agreed to the following task division: Huygens Instituut will digitize needed novels for the project The Riddle of Literary Quality (in house, no extra budget needed) and correct the OCR of relevant novels already scanned by Reynaert. Reynaert will cover the legal aspects of the electronic texts: the SoNaR team will put in time to arrange permission of inclusion of those digital texts into SoNaR. This means the texts can be legally used for any kind of research (which is especially relevant for research into literary quality, assuming that authors or publishers may not all be equally willing to give permission for their texts to be researched from the perspective of this topic). Methods for establishing text quality in general will be looked into.</p>
<p><strong>3.</strong> <strong>Selection of low-level tools (measures and algorithms to apply)</strong> (Q1-3)<br />
The relevant measures and patterns will be selected. The list will include such features as mean word length, mean amount of syllables per word, mean sentence length, mean paragraph length; lexical richness (measuring how many different words (dictionary entries) a text has, use and distribution of parts of speech, use of named entities, dialogue versus narrative, and any other that are suggested in the literature and e.g. available in the JGAAP set of tools (cf. Task 4,2b and <a title="http://evllabs.com/jgaap/w/index.php/Documentation" href="http://evllabs.com/jgaap/w/index.php/Documentation">http://evllabs.com/jgaap/w/index.php/Documentation</a> and Juola 2006a, 2006b). Tools specifically developed for Dutch by the Induction of Linguistic Knowledge group at Tilburg University will also be used. They are available under an open source license GPL and will also be part of the webservices to be provided by CLARIN-NL (cf. <a title="http://www.clarin.nl" href="http://www.clarin.nl">http://www.clarin.nl</a>). We will not go into this in detail now, but focus our description on the most innovative task in the project, Task 4.</p>
<p><strong>4.</strong> <strong>Selection and start of the development of the high-level tools (measures and algorithms to apply)</strong> (Q1-3 start, continuing throughout the whole project)<br />
This task is the computationally most innovative part of the project, which is why we will describe it in more detail than the other tasks. The iterative aspect of the work process will be most prominent in this task. Work is started right at the beginning of the project and will continue through the whole four years of the project. Most of the work of the Postdoc and developer will go into this task.<br />
The high-level tools will be based on Probabilistic Tree-Substitution Grammars (PTSGs) as developed within the Data-Oriented Parsing (DOP) framework (cf. Bod et al. 2003). PTSGs subsume Probabilistic Context-Free Grammars (PCFGs) when the syntactic dependencies are restricted to one level of constituent structure. The tools for learning PTSGs are available via the DOP homepage: <a title="http://staff.science.uva.nl/~rens/dop.html" href="http://staff.science.uva.nl/%7Erens/dop.html">http://staff.science.uva.nl/~rens/dop.html</a>. PTSGs and PCFGs can be derived both in a supervised and in an unsupervised manner using flat, unannotated data (the latter known as U-DOP, see Bod 2007, 2009, Smets 2010). Since enough flat data is available for Dutch, it is preferable to use unsupervised parsing techniques (U-DOP), since it overcomes the problem of manually annotating training sets, while still being able to induce trees with syntactic categories (cf. Smets 2010). Both approaches will be tested, however. Given the success of the simplest PTSGs for style recognition (i.e. PCFGs tested by Raghavan et al. 2010), it is likely that models richer than PCFGs will enhance the prediction of style. To apply these tools and algorithms to the analysis of literary quality, the following extensions will be developed and tested:<br />
<strong>(1)</strong> Develop and test the usefulness of various (novel) notions of <em>syntactic complexity</em> (as motivated above) for literary style/quality:<br />
<strong>a.</strong> Define and test syntactic complexity as the number of nodes in a syntactic tree of an utterance normalized by sentence length: the more nodes per sentence-length, the more complex the syntax of the sentence. (This is a very simple metric, but will be useful as a baseline).<br />
<strong>b.</strong> Define and test syntactic complexity as the number of subtrees by which a tree is generated by the induced PTSG: the fewer subtrees that are needed, the more repetitive a sentence or a text, and the less complex. This can be tested both internally and externally: if the PTSG is induced from external (i.e. held-out) data, the text can be tested on &#8216;originality&#8217; versus &#8216;repetitiveness&#8217; using this measure. Also the amount of conventional phrases can be detected in this way.<br />
<strong>c.</strong> Parse &#8216;high literature&#8217; (autonomous) with a PTSG trained on &#8216;low literature&#8217; (heteronomous) and vice versa. The percentage of high-literary sentences that can be parsed by PTSGs trained on low literature can be seen as a measure for <em>literary complexity</em>. Also the opposite may be interesting to test: can low literary sentences always be parsed by PTSGs that are trained on high literary sentences? If so, we can conclude that low literary sentences are ‘included’ by high literary sentences. If not, low literary sentences are to some extent ‘different’ from high literary sentences (which can be quantified by the percentage of parsed sentences). This will also be tested on and compared to a small set of novels for children and adolescents.<br />
<strong>(2)</strong> Adapt PTSGs (DOP and U-DOP) to literary parsing.<br />
<strong>a.</strong> During the last few years, both unsupervised and supervised PTSGs have been tested on large-scale text corpora in Bod’s VICI group (see e.g. Bod 2009; Sangati et al. 2010, Smets 2010). However, training on literary texts may result in interesting new problems. The two available servers in Bod’s VICI group operate both with 256 GB internal memory (RAM), which can (incrementally) induce PTSGs for over 100 million words. Once the PTSGs are induced, they can run efficiently on smaller platforms (with 48 GB internal memory). It is expected that the UvA-algorithms can be straightforwardly applied to induce PTSGs for literary texts (made available by Huygens Instituut), such that they can directly be used for the definition of syntactic complexity above.<br />
<strong>b.</strong> Integrate the syntactic property of complexity as an additional feature in the JGAAP framework (Juola 2006). Although syntactic complexity can be tested separately, as proposed under (1), we also want to integrate these as (weighted) features in the general JGAAP model for style recognition. This may result in a fine-grained comparison between a wide range of different textual properties and patterns, leading to the first stylistic analysis that integrates low-level and high-level patterns.<br />
<strong>c.</strong> Investigate an extension towards story grammars. Recent work in the ILLC by Löwe et al. (2010) on story grammars shows that discourse structure can reveal underlying narrative building blocks (‘narratemes’) as well as motifs and plots. Although full-fledged story analysis is still in its infancy, it may contribute to finding even <em>higher</em> level patterns, such as plot lines of narratives. This part of the project does not guarantee success, but it is exciting enough to be tested. It should be stressed that application of the tools above will already result in a successful project, and that the use of more sophisticated state-of-the-art techniques makes the project computationally and intellectually more fascinating.</p>
<p><strong>5.</strong> <strong>Integrating tools in a simple user-interface</strong> (Q2-5 and continuing throughout the project)<br />
To make the application of the tools usable and testable by the literary scholars linked to the project, a simple online interface will be created in a form and environment that will be decided on by the criteria of online availability and efficiency on different platforms. A promising candidate is CLAM (Computational Linguistics Application Mediator) developed by ILK at Tilburg University within CLARIN-NL projects with the specific aim of having a flexible yet thorough solution for turning existing applications into fully-fledged web applications/services. The tools will be integrated as soon as they are available, starting with the low-level pattern recognition tools as described under Task 3. The high-level tools under Task 4 will be added as soon as they are developed.</p>
<p><strong>6.</strong> <strong>Applying the tools to the Modern Dutch training corpus</strong> (Q3-6)<br />
First tests of the tools on the two training sets, mostly the ones measuring low-level patterns (cf. Task 3) and the first available high-level pattern recognition tools as described in Task 4. An analysis of the results, fine-tuning of the tools, looking for trends and weighing factors will be performed.</p>
<p><strong>7.</strong> <strong>Description of the preliminary results and registration of predictions for other texts</strong> (Q5-6)<br />
Intermediate results will be discussed in a closed workshop by the project team with a.o. the associated external scholars. Predictions of results for other texts will be listed, thus planning the next steps of the research.</p>
<p><strong>8. Applying the fine-tuned tools to other Modern Dutch texts</strong> (Q6-9, and continuing throughout the project)<br />
The fine-tuned tools as described in Task 5 will be applied to texts or text groups (e.g. &#8216;literary thrillers&#8217;) from the larger text corpus established in collaboration with SoNaR to find out if the patterns this will yield agree with the predictions made under Task 7. Results will be presented in papers at conferences; competing for the Academic Year Prize will be considered.</p>
<p><strong>9.</strong> <strong>Applying the tools to Modern Frisian and on 19th-century Dutch and American-English novels</strong> (Q9-12, and continuing throughout the project)<br />
To research their across-language applicability, the tools will be applied to a corpus of Modern Frisian novels. The diachronic applicability will be tested on a corpus of American novels from the 19th and the early 20th century as collected by professor David L. Hoover (New York University) and on available 19th-century Dutch novels. We do not expect to be able to comprehensively address this topic, which we therefore approach as an experiment which can show where the possibilities and the problems are to be expected. The experiments, which will probably have an iterative nature, will result in a project plan for follow-up research into canon formation and cross-language comparison of quality characteristics. The Frisian texts will be provided by the Frisian Language Database and by Tresoar (provincial library of Friesland).</p>
<p><strong>10.</strong> <strong>Overall analysis of the results (articles, volume of papers, dissertation)</strong> (Q11-16)<br />
A combination of a closed and open workshop will be organized for the presentation and discussion of further results. Articles will be published in e.g. a volume of papers. Other publications and the dissertation will be written. New research plans for following-up research will be written in collaboration with the different partners.</p>
<p><strong>11.</strong> <strong>Release of tools for the wider scholarly community on the website of Huygens Institute for the History of the Netherlands</strong> (Q15-16)<br />
The tools will be made available online through a server of Huygens Instituut (also acting as a CLARIN centre), with documentation and the possibility of instruction sessions for interested scholars. The analysed texts will be made available in the SoNaR corpus.</p>
<p><strong>Added Data paragraph</strong><br />
Sustainability:<br />
The tools to be developed will be archived according to the procedure which is being established by the cooperating  five CLARIN centres (a.o. Huygens ING and DANS).<br />
Open Access:<br />
Copyright is a main problem for open access of the  novels which will be digitized especially for this project. These will become freely available in the SoNaR corpus if Martin Reynaert can arrange legal permission. It will be researched whether the data resulting from the measurements can be made available without copyright infirngement to overcome some of the drawbacks of not being able to provide researchers with all digitized novels.<br />
The aim is to also make the tools to be developed available open access through the CLARIN center infrastructure, in the architecture that Huygens ING is currently developing for webservices and according to the standards to be decided on in the CLARIN environment.<br />
Metadata:<br />
All metadata are expected to become available open access.</p>
<p><strong>References</strong></p>
<p><strong>Albers 2007</strong> Sabine Albers, &#8216;Top or Flop: characteristics of bestsellers&#8217;. In: Lesley Jeffries, Dan McIntyre and Derek Bousfield (Eds.), Stylistics and social cognition. Amsterdam / New York: Rodopi, 2007 (Poetics and Linguistics Association (PALA): 4), p. 205-215<br />
<strong>Bansal and Klein 2010</strong> M. Bansal &amp; D. Klein, &#8216;Simple, Accurate Parsing with an All-Fragments Grammar&#8217;. In: Proceedings ACL 2010, Stroudsburg: Association for Computational Linguistics, p. 110-117<br />
<strong>Bod 1992</strong> R. Bod, &#8216;A Computational Model of Language Performance: Data-Oriented Parsing&#8217;. In: Proceedings COLING 1992, Stroudsburg: Association for Computational Linguistics, p. 855-859<br />
<strong>Bod et al. 2003</strong> R. Bod, R. Scha &amp; K. Sima&#8217;an (Eds.), Data-Oriented Parsing. Stanford: CSLI Publications, 2003<br />
<strong>Bod 2007</strong> R. Bod, &#8216;Is the End of Supervised Parsing in Sight?&#8217; In: Proceedings ACL 2007, Stroudsburg: Association for Computational Linguistics, p. 400-407<br />
<strong>Bod 2009</strong> R. Bod. &#8216;From Exemplar to Grammar: A Probabilistic Analogy-Based Model of Language Learning&#8217;, In: Cognitive Science, 33 (2009), 5, p. 752-793<br />
<strong>Bortolussi &amp; Dixon 2003</strong> Marisa Bortolussi &amp; Peter Dixon, Psychonarratology. Foundations for the empirical study of literary response. Cambridge University Press, 2003<br />
<strong>Bryant &amp; Vorderer 2006</strong> Jennings Bryant &amp; Peter Vorderer (eds.), Psychology of entertainment. (Routledge Communication Series) Routledge, 2006<br />
<strong>CLARIN-NL</strong> <a title="http://www.clarin.nl" href="http://www.clarin.nl">http://www.clarin.nl</a><br />
<strong>Harris 1995</strong> Wendell V. Harris, Literary meaning. Reclaming the study of literature. New York: Palgrave Macmillan, 1995<br />
<strong>Hoover 1999</strong> David L. Hoover, Language and style in The Inheritors. Lanham etc.: University Press of America, 1999<br />
<strong>Hoover 2010</strong> David L. Hoover, &#8216;Authorial Style&#8217;. In: Dan McIntyre and Beatrix Busse (eds.), Language and Style: Essays in Honour of Mick Short, New York: Palgrave, 2010, p. 250-271<br />
<strong>Juola 2006a</strong> Patrick Juola, John Sofko &amp; Patrick Brennan, &#8216;A Prototype for Authorship Attribution Studies&#8217;. In: Literary and Linguistic Computing 21: 169-178<br />
<strong>Juola 2006b</strong> Patrick Juola, &#8216;Authorship Attribution&#8217;. In: Foundations and Trends in Information Retrieval 1 (2006), 3, p. 233-334; http:/dx.doi.org/10.1561/1500000005<br />
<strong>Kestemont &amp; Van Dalen-Oskam 2009</strong> Mike Kestemont &amp; Karina van Dalen-Oskam, Predicting the Past: Memory-Based Copyist and Author Discrimination in Medieval Epics. In Proceedings of the twenty-first Benelux Conference on Artificial Intelligence (BNAIC 2009). Eindhoven, p. 121-128.<br />
<strong>Louwerse et al. 2008</strong> Max Louwerse, Nick Benesh &amp; Bin Zhang, &#8216;Computationally discriminating literary from non-literary texts&#8217;. In: S. Zyngier, M. Bortolussi, A. Chesnokova, J. Auracher (Eds.), Directions in empirical literary studies, Amsterdam: Benjamins, 2008, p.175-192<br />
<strong>Löwe et al. 2009</strong> Benedikt Löwe, Eric Pacuit and Sanchit Saraf, &#8216;Identifying the Structure of a Narrative via an Agent-based Logic of Preferences and Beliefs: Formalizations of Episodes from CSI: Crime Scene Investigation&#8217;, in Michael Duvigneau en Daniel Moldt (eds.), MOCA&#8217;09, Fifth International Workshop on Modelling of Objects, Components, and Agents, Hamburg, 2009.<br />
<strong>McDonald 2007</strong> Ronan McDonald, The death of the critic. New York/London: Continuum, 2009 (1st ed. 2007)<br />
<strong>Raghavan et al. 2010</strong> Sindhu Raghavan, Adriana Kovashka, Raymond Mooney, &#8216;Authorship Attribution Using Probabilistic Context-Free Grammars&#8217;. In: Proceedings ACL 2010, <a title="http://www.aclweb.org/anthology/P/P10/P10-2008.pdf" href="http://www.aclweb.org/anthology/P/P10/P10-2008.pdf">http://www.aclweb.org/anthology/P/P10/P10-2008.pdf</a><br />
<strong>Sangati et al. 2010</strong> F. Sangati, W. Zuidema &amp; R. Bod, &#8216;Efficiently extract recurring tree fragments from large treebanks&#8217;. In: Proceedings LREC10, Malta<br />
<strong>Scha 1990</strong> R. Scha, &#8216;Taaltheorie en Taaltechnologie; Competence en Performance&#8217;. In: Q. de Kort &amp; G. Leerdam (Eds.), Computertoepassingen in de Neerlandistiek. Almere: Landelijke Vereniging van Neerlandici, 1990, p. 7-22<br />
<strong>Sima’an 1996</strong> K. Sima&#8217;an, &#8216;Computational complexity of probabilistic disambiguation by means of tree grammars&#8217;. In: Proceedings COLING 1996 Stroudsburg: Association for Computational Linguistics, p. 1175-1180<br />
<strong>Smets 2010</strong> Margaux Smets, &#8216;A DOP-inspired approach to syntactic category induction&#8217;, Paper presented at CLIN 2010 (currently under submission)<br />
<strong>SoNaR</strong> <a title="http://lands.let.ru.nl/projects/SoNaR/" href="http://lands.let.ru.nl/projects/SoNaR/">http://lands.let.ru.nl/projects/SoNaR/</a><br />
<strong>Vaessens 2009</strong> Thomas Vaessens, De revanche van de roman. Literatuur, autoriteit en engagement. Nijmegen: Uitgeverij Vantilt, 2009<br />
<strong>Van Dalen-Oskam &amp; Van Zundert 2007</strong> Karina van Dalen-Oskam &amp; Joris van Zundert, Delta for Middle Dutch – Author and Copyist Distinction in Walewein. In Literary and Linguistic Computing 22 (2007), p. 345-362<br />
<strong>Van Dalen-Oskam &amp; Van Zundert 2008</strong> Karina van Dalen-Oskam &amp; Joris van Zundert, The Quest for Uniqueness: Author and Copyist Distinction in Middle Dutch Arthurian Romances based on Computer-assisted Lexicon Analysis. In Mooijaart, M., van der Wal, M. (eds.) Yesterday&#8217;s words: contemporary, current and future lexicography. [Proceedings of the Third International Conference on Historical Lexicography and Lexicology (ICHLL), 21-23 June 2006, Leiden]. Cambridge: Cambridge Scholars Publishing, p. 292-304<br />
<strong>Van Peer 2008</strong> Willie van Peer (ed.), The Quality of Literature. Linguistic studies in literary evaluation. Amsterdam: John Benjamins, 2008 (Linguistic Approaches to Literature 4)<br />
<strong>Von Heydebrand &amp; Winko 1996</strong> Renate von Heydebrand &amp; Simone Winko, Einfuehrung in die Wertung von Literatur. Systematik &#8211; Geschichte &#8211; Legitimation. Paderborn etc.: Ferdinand Schoeningh, 1996<br />
<strong>Zuidema 2007</strong> W. Zuidema, &#8216;Parsimonious data-oriented parsing&#8217;. In: Proceedings EMNLP 2007, Stroudsburg: Association for Computational Linguistics, p. 551-560</p>
]]></content:encoded>
			<wfw:commentRss>http://literaryquality.huygens.knaw.nl/?feed=rss2&#038;page_id=36</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

