NEDERBOOMS D3.1 Case Study on NP/PP Alternation Liesbeth Augustinus
[email protected] Vincent Vandeghinste
[email protected] Frank Van Eynde
[email protected] Centre for Computational Linguistics KULeuven 2012
1
Introduction
The case study presented here is a treebank-based investigation of the indirect object in Dutch. Special attention goes out to np/pp alternation, a phenomenon which is also known as dative alternation.1 np/pp alternation has been extensively discussed in the literature on Dutch, see amongst others Haeseryn et al. [1997], Van Belle and Van Langendonck [1996], and Van der Beek [2005]. The following examples show that Dutch indirect objects can take the form of an np (1) or a pp (2). (1)
Ik geef mijn vriend een boek. I give my friend a book ‘I give my friend a book.’
(2)
Ik geef aan mijn vriend een boek. I give to my friend a book ‘I give a book to my friend.’
We will investigate the occurence of indirect objects in the LASSY Small Treebank [van Noord et al., 2006, in press], the manually annotated part of the LASSY treebank (65k sentences). GrETEL [Augustinus et al., 2012] is used as a tool for treebank mining. In section 2 two main research questions will be considered. First, we will illustrate in section 2.1 how np/pp alternation is represented in the treebank. Section 2.2 deals with np/pp alternation. We will look for the occurence of verbs taking np versus pp indirect objects using GrETEL. In section 2.3 we will investigate whether GrETEL can be used to retrieve sentences with a specific ordering of direct and indirect object. In section 3 we will draw conclusions with respect to the case study, and we will point out some suggestions for improving GrETEL.
2 2.1
NP/PP Alternation in LASSY Representation of Direct and Indirect Objects
Before turning to the treebank mining, we shortly consider the representation of direct (DO) and indirect objects (IO) in LASSY. 1 As Dutch has only limited case marking, NP/PP alternation will be used instead of dative alternation.
2
(1)
(2) Figure 1: Alpino parse trees of examples (1) and (2)
Figure 1 presents the tree representations of examples (1) and (2), parsed with Alpino [van Noord, 2006].2 The trees show that indirect objects are indicated with the feature obj2 (secondary object), which can be contrasted to obj1, the dependency relation to indicate direct objects.3 More information on the syntactic annotation of the LASSY treebank can be found in van Noord et al. [2011].
2.2
NPs versus PPs
In order to retrieve sentences in which a verb takes an IO(np) and sentences with an IO(pp), we use example sentences (1) and (2) as input for GrETEL.4
(1)
(2) Figure 2: Input matrices of examples (1) and (2)
2 Alpino
outputs dependency trees with the same lay-out as the LASSY treebank. obj2 feature includes indirect object, beneficiary, and experiencer roles [van Noord et al., 2011]. 4 We have carried out the treebank search of both examples one after another, but for explanatory reasons we present the two examples side by side. 3 The
3
Figure 2 shows the input matrices of examples (1) and (2) respectively. Since we are not looking for specific word forms or lemmas, pos is selected for the verb form geef [E: ‘give’], the head noun of the indirect object vriend [E: ‘friend’], and the preposition aan [E: ‘to’] in the case of example (2).
(1)
smain hd verb
(2)
obj2 np
smain hd verb
hd noun
obj2 pp hd prep
obj1 np hd noun
Figure 3: Subtrees extracted from examples (1) and (2) The subtrees that are generated from those input examples are presented in Figure 3. In the next step those subtrees are converted into XPath queries. The XPath query corresponding to the subtree extracted from example (1) is given in (3), and the query corresponding to the subtree derived from example (2) is shown in (4). (3)
//node[@cat="smain" and node[@rel="hd" and @pos="verb"] and node[@rel="obj2" and @cat="np" and node[@rel="hd" and @pos="noun"]]]
(4)
//node[@cat="smain" and node[@rel="hd" and @pos="verb"] and node[@rel="obj2" and @cat="pp" and node[@rel="hd" and @pos="prep"] and node[@rel="obj1" and @cat="np" and node[@rel="hd" and @pos="noun"]]]]
In GrETEL’s basic search mode, the queries are immediately matched against the treebank. For the first query, GrETEL returns 189 hits spread over 188 sentences that contain IO(np)s. In the case of IO(pp)s, 130 hits are found in 130 sentences. 10 results found by query (3) and (4) are presented in respectively Table 1 and Table 2.
4
• Vaak levert dit de exporterende landen zo weinig op, dat het nauwelijks lonend is. [WR-P-E-I-0000006366.p.8.s.184] • Op dat moment kwam Horrocks Generaal Thomas te hulp. [WR-P-E-I-0000016944.p.3.s.233] • Dit was de Britten echter niet genoeg. [WR-P-E-I-0000027216.p.1.s.121] • De Vlaamse graven gaven de nederzetting een van torens voorziene vestingmuur. [WR-P-E-I-0000044854.p.3.s.14] • Het economisch herstel dat in 1924 intrad gaf de burger weer moed. [WR-P-E-I-0000054957.p.2.s.237.3] • FNB vraagt haar klanten een contributie van euro 25,- per jaar. [WR-P-P-E-0000000001.p.8.s.7] • De geluidseffecten geven de liedjes net iets meer schwung. [WR-P-P-H-0000000076.p.4.s.8] • Topfotografe Annie Leibovitz geeft het blad een duidelijk gezicht. [dpc-ind-001631-nl-sen.p.47.s.5] • Hetzelfde overkwam onlangs Amerikaanse en Duitse militaire computers. [dpc-ind-001645-nl-sen.p.4.s.2] • Leopold stelde de regering zo hoge eisen over zijn terugkeer, dat die steeds meer op troonsafstand aandrong. [wiki-6879.p.12.s.3]
Table 1: 10 results of verbs taking an IO(np) found by query (3)
5
• ‘Che’ gaf mede richting aan het linkse procommunistische pad van het Castro-regime. [WR-P-E-I-0000004745.p.5.s.41] • De hoogbejaarde Edison stuurde hierover een vriendelijke bedankbrief aan de hr Vink: [WR-P-E-I-0000049645.p.1.s.190.1] • Binnenkort brengt Sharon een bezoek aan president Bush. [WS-U-E-A-0000000033.p.26.s.8] • Huizenbewoners betalen die lasten aan hun gemeente. [WS-U-E-A-0000000234.p.22.s.3] • Ter gelegenheid van deze titel bracht Albert samen met zijn moeder een bezoek aan deze stad. [wiki-889.p.4.s.3] • Ensor vraagt aan een huisschilder verf te bereiden in potten van 5 en 10 kg. [wiki-832.p.30.s.3] • Premier Balkenende geeft nu ook een Europese draai aan zijn oproep tot normen en waarden. [WS-U-E-A-0000000248.p.22.s.1] • Met zijn werk geeft De Sol`a-Morales vorm en materie aan die consensus. [dpc-cam-001283-nl-sen.p.10.s.1] • Hij verklaarde aan de pers dat hij ‘de noordelijke grens van Rusland zou vastleggen’. [dpc-ind-001639-nl-sen.p.5.s.2] • Hierdoor staan de betrokken inrichtende machten een deel van hun autonomie af aan de netten. [dpc-vla-001175-nl-sen.p.47.s.2] Table 2: 10 results of verbs taking an IO(pp) found by query (4)
The results found by (3) and (4) show that GrETEL indeed returns sentences which contain a verb taking an indirect object (obj2). It should be noted, however, that both queries look for indirect objects in very specific constructions. Since the top nodes of both subtrees contain the smain feature, GrETEL will only consider main clauses. Furthermore, GrETEL will only take into account obj2s that are common nouns, as the noun feature does not include proper nouns and pronouns. Those parts of speech can be retrieved by the features name and pron respectively. There are two options to find such constructions with GrETEL. An easy way to find more obj2 constructions is to feed different input sentences to the system. The examples in (5 - 7) will find sentences that are not included in the results of query (3) and (4).
6
(5)
Ik geef hem een boek. ‘I give him a book.’
(6)
Ik geef Jan een boek. ‘I give John a book.’
(7)
. . . dat ik mijn vriend een boek geef. ‘. . . that I give my friend a book.’
Another solution to find more matches is to refine the queries in the advanced search mode. One could for example underspecify the category of the top node in the subtree by removing the smain feature. In that way the queries will look for both main clauses and subclauses. Furthermore, we adapted the query in order to include pronoun and proper noun obj2s. That results in the generalised queries (8) and (9). (8)
//node[node[@rel="hd" and @pos="verb"] and node[@rel="obj2" and (@cat="np" or @pos="noun" or @pos="pron" or @pos="name")]]
(9)
//node[node[@rel="hd" and @pos="verb"] and node[@rel="obj2" and @cat="pp" and node[@rel="hd" and @pos="prep"] and node[@rel="obj1" and (@cat="np" or @pos="noun" or @pos="pron" or @pos="name")]]]
Those generalised queries return more results at once. Query (8) finds 1402 hits in 1376 sentences; query (9) returns 727 hits spread over 716 sentences. Five results of each query that were not included by the previous queries are presented in Table 3 and Table 4 respectively.
• Maar blinden ‘zien’ vooral de kansen die computers hen bieden. [dpc-cam-001284-nl-sen.p.13.s.2] • Een samenleving die mensen de kans biedt het beste uit zichzelf te halen. [dpc-bal-001236-nl-sen.p.16.s.4] • Graag wil ik u schetsen welke weg we daarbij kunnen bewandelen. [dpc-bal-001238-nl-sen.p.20.s.3] • Internationale studenten geven K.U.Leuven globaal goede beoordeling [dpc-cam-001279-nl-sen.p.1.s.1] • Honderd jaar geleden deed Leo Baekeland het hen voor. [dpc-gaz-001006-nl-sen.p.3.s.1]
Table 3: 5 results of verbs taking an IO(np) found by query (8)
7
• Zenuwcellen die meestal geprikkeld worden als je je lievelingskostje eet, reageren ook als je geld aan een goed doel geeft [dpc-ind-001634-nl-sen.p.3.s.1] • Hier zijn vensters en deuren geplaatst in doorlopende nissen die aan de gevel een strakke gevelindeling verlenen. [WR-P-E-I-0000044854.p.3.s.69] • “Vraag maar aan Simon (Kelner, hoofdredacteur van The Independent).” [dpc-ind-001631-nl-sen.p.30.s.3] • Elk van deze bedrijven leende zijn naam aan een van de tentoonstellingszalen. [dpc-qty-000929-nl-sen.p.11.s.4] • In het raam van de thans ondernomen actie zullen we ze aan de Juridische Dienst voorleggen. [dpc-riz-001055-nl-sen.p.3.s.2] Table 4: 5 results of verbs taking an IO(pp) found by query (9)
Van der Beek [2005] mentions that the the choice of an indirect object to be realised as an np or a pp is mainly based on lexical preferences of the verb. In order to further investigate that it would be interesting to extract a list of verbs taking an indirect object from LASSY. Unfortunately this is currently not possible with GrETEL.5
2.3
Order of Direct Object and Indirect Object
In this section we consider a topic related to np/pp alternation: the word order of direct and indirect object. A discussion on the position of both direct and indirect object can be found in Haeseryn et al. [1997], and a treebank-based investigation on the topic is described in Van der Beek [2005]. Van der Beek [2005] notes that there are four variants of the realisation of the indirect object in Dutch (in contrast to English, which has only two ways of expressing the indirect object). Similar to English, the indirect object can occur as an np or as a pp (cf. section 1). Dutch differs from English as its word order is more free. Both realisations of the indirect object can occur before or after the direct object, as is shown in examples (10a - 10d). The indirect objects are indicated in bold; the direct objects are underlined. (10)
a. Ik geef een boek aan mijn vriend. I give a book to my friend ‘I give a book to my friend.’
5 Extracting
lists of verbs is possible with Dact [de Kok, 2010].
8
b. Ik geef aan mijn vriend een boek. I give to my friend a book ‘I give a book to my friend.’ c. Ik geef mijn vriend een boek. I give my friend a book ‘I give my friend a book.’ d. Ik geef het hem. I give it him ‘I give him it.’ (ik heb dat mijn vriend gegeven.) According to van der Beek [2005], 10a and 10c are constructions with canonical word order, whereas 10b and 10d are their non-canonical dependants. So, if both direct and indirect object are realised as nps, the canonical word order is IO(np) < DO(np). In the case of prepositional indirect objects, the canonical word order is DO(np) < IO(pp). In example (10b), the indirect object pp is shifted in front of the direct object np. If both arguments are nps, the shift of a direct object in front of the indirect object can only occur if the direct object is not a full np [Haeseryn et al., 1997]. Compare (10d) to the ungrammatical example (11). (11)
* Ik geef een boek mijn vriend. I give a book my friend
Van der Beek’s treebank search on CGN and the Alpino treebank confirmed that the constructions with canonical word order occur more frequently than the non-canonical variants. In the remainder of this section, we will compare the overall distribution of the four alternants in LASSY Small to Van der Beek’s findings.6 . We will make use of GrETEL to query the LASSY Small treebank, since it is possible to retrieve sentences with a specific word order by activating GrETEL’s built-in Ordering Filter. The examples given in (10) are used as input constructions for GrETEL to investigate argument order of direct and indirect object in LASSY Small.7 For each input sentence, pos is selected for the head noun of both direct and indirect indirect object, and for the preposition aan [E: ‘to’] in the case of prepositional indirect objects. The queries corresponding to those input constructions are presented in (12). (12)
a. //node[@cat="smain" and node[@rel="obj1" and @cat="np" and node[@rel="hd" and @pos="noun" and @begin < ../../node[@rel="obj2" and @cat="pp"]/node[@rel="hd" and @pos="prep"]/@begin]] and node[@rel="obj2" and @cat="pp" and node[@rel="hd" and @pos="prep" and @begin < ../node[@rel="obj1" and @cat="np"]/node[@rel="hd" and @pos="noun"]/@begin] and node[@rel="obj1" and @cat="np" and node[@rel="hd" and @pos="noun"]]]]
6 The treebank investigations will be less detailed than Van der Beek’s study, as she excludes certain occurences of indirect objects (e.g. krijgen-passive) 7 We have limited the input to four examples. For a more exhaustive treebank search, one could use more examples, or adapt the queries to more abstract queries.
9
b. //node[@cat="smain" and node[@rel="obj2" and @cat="pp" and node[@rel="hd" and @pos="prep" and @begin < ../node[@rel="obj1" and @cat="np"]/node[@rel="hd" and @pos="noun"]/@begin] and node[@rel="obj1" and @cat="np" and node[@rel="hd" and @pos="noun" and @begin < ../../../node[@rel="obj1" and @cat="np"]/node[@rel="hd" and @pos="noun"]/@begin]]] and node[@rel="obj1" and @cat="np" and node[@rel="hd" and @pos="noun"]]]
c. //node[@cat="smain" and node[@rel="obj2" and @cat="np" and node[@rel="hd" and @pos="noun" and @begin < ../../node[@rel="obj1" and @cat="np"]/node[@rel="hd" and @pos="noun"]/@begin]] and node[@rel="obj1" and @cat="np" and node[@rel="hd" and @pos="noun"]]]
d. //node[@cat="smain" and node[@rel="obj1" and @pos="det" and @begin < ../node[@rel="obj2" and @pos="pron"]/@begin] and node[@rel="obj2" and @pos="pron"]]
The results of those queries for the complete LASSY Small treebank are given in Table 5. No sentences were found with more than one match. Argument Order DO(NP) < IO(PP) [cf. (10a)]
Hits 74
IO(PP) < DO(NP) [cf. (10b)]
7
IO(NP) < DO(NP) [cf. (10c)]
67
DO(NP) < IO(NP) [cf. (10d)]
3
Treebank Example Op 28 augustus 2002 bracht minister-president Balkenende een kennismakingsbezoek aan zijn Belgische ambtgenoot, Guy Verhofstadt. [WR-P-E-H-0000000051.p.293.s.5] Ten slotte kent de overheid aan elke school een puntenenveloppe toe voor beleids- en ondersteunend personeel [dpc-vla-001175-nl-sen.p.91.s.1] Topfotografe Annie Leibovitz geeft het blad een duidelijk gezicht. [dpc-ind-001631-nl-sen.p.47.s.5] Honderd jaar geleden deed Leo Baekeland het hen voor. [dpc-gaz-001006-nl-sen.p.3.s.1]
Table 5: Results of queries (12a - 12d) with argument ordering of direct and indirect object
The results presented in Table 5 show that the treebank contains more sentences with the canonical order [DO(NP) < IO(PP) and IO(NP) < DO(NP)] than shifted variants [IO(PP) < DO(NP) and DO(NP) < IO(NP)]. Those results are thus in line with Van der Beek’s findings in Alpino and CGN. Van der Beek furthermore investigates which principles influence the choice for a specific constructions, such as syntactic weight, pronominality and definiteness. Investigating such underlying motivations with regard to the results presented in this study is left for future research.
10
3
Conclusions
In this case study we have tested whether GrETEL is a suitable tool to investigate np/pp alternation. For both aspects of the case study (i.e. verbs selecting nps and/or vps, and the order of DO and IO), GrETEL returned sentences similar to the input examples. The results in section 2.2 show that there are more verbs taking an IO(NP) than an IO(PP). Unfortunately, it is not possible to generate a list of verbs taking an IO with GrETEL, since only full sentences are returned. The data found in section 2.3 reveal that Dutch shows a remarkable difference between the canonical order of DO and IO in comparison to the shifted variants. It should be noted, however, that GrETEL sticks closely to the input examples, which results in very specific search instructions. In some cases it is hard to generalise the obtained results. Usually the XPath expressions can be generalised easily by slightly adapting them (for example by removing the ’top’-category). Finally, the ordering filter proved to be very helpful for building XPath expressions in which the order of the nodes is relevant, since such queries are usually very complicated and thus error-prone if they have to be build up from scratch.
References Liesbeth Augustinus, Vincent Vandeghinste, and Frank Van Eynde. Examplebased treebank querying. In Proceedings of LREC’12, Istanbul, Turkey, 2012. D. de Kok. Dact [Decaffeinated Alpino Corpus Tool]. URL: http://rug-compling.github.com/dact, 2010. Walter Haeseryn, Kirsten Romijn, Guido Geerts, Jaap de Rooij, and Maarten van den Toorn. Algemene Nederlandse Spraakkunst. Martinus Nijhoff/Wolters Plantyn, Groningen/Deurne, second edition, 1997. William Van Belle and Willy Van Langendonck. The indirect object in Dutch. In William Van Belle and Willy Van Langendonck, editors, The Dative. Descriptive Studies, volume I, pages 217–250. John Benjamins, Amsterdam/Philadelphia, 1996. Leonoor van der Beek. Topics in Corpus-Based Dutch Syntax. PhD thesis, Rijksuniversiteit Groningen, Groningen,The Netherlands, 2005. Gertjan van Noord. At Last Parsing Is Now Operational. In TALN 2006, pages 20–42, 2006. Gertjan van Noord, Ineke Schuurman, and Vincent Vandeghinste. Syntactic annotation of large corpora in STEVIN. In Proceedings of LREC’06, pages 1811–1814, Genoa, Italy, 2006. 11
Gertjan van Noord, Ineke Schuurman, and Gosse Bouma. Lassy Syntactische Annotatie, Revision 19455, 2011. www.let.rug.nl/vannoord/Lassy/saman lassy.pdf. Gertjan van Noord, Gosse Bouma, Frank Van Eynde, Daniel de Kok, Jelmer van der Linde, Ineke Schuurman, Erik Tjong Kim Sang, and Vincent Vandeghinste. Large Scale Syntactic Annotation of Written Dutch: Lassy. In Essential Speech and Language Technology for Dutch: resources, tools and applications. Springer, in press.
12