Large scale syntactic annotation for Dutch Gertjan van Noord University of Groningen
1
Context: Wide-coverage Parsing
• Assign syntactic structure to sentence • Neccessary step to determine the meaning
2
Context: Wide-coverage Parsing (2)
• Met de verrekijker zie ik de man • De man met de verrekijker zie ik
3
pp
obj1
np
det det de2
top
smain
smain
hd verb zie4
mod
hd prep met1
top
hd noun verrekijke3
su pron ik5
det det de6
obj1
obj1
np
np
hd noun man7
det det de1
hd verb zie6
hd noun man2
mod
pp
hd prep met3
obj1
np
det det de4
hd noun verrekijke5
su pron ik7
4
Context: Wide-coverage Parsing (3)
• Dit is de vrouw die de mannen hebben gezien • Dit is de vrouw die de mannen heeft gezien
5
su det dit1
top
top
smain
smain
hd verb ben2
su det dit1
predc
np
det det de3
hd noun vrouw4
hd verb ben2
predc
np
det det de3
mod
rel
hd noun vrouw4
rhd
mod
rel
rhd
4
body
4
body
noun die5
ssub
noun die5
ssub
su
4
hd verb heb8
su
vc
6 np
ppart
su
obj1
4
np
det det de6
hd noun man7
hd verb heb8
hd verb zie9
det det de6
hd noun man7
vc
ppart
obj1
su
4
6
hd verb zie9
6
Parsing: state of the art
• Full parsing is ? fragile ? slow ? inaccurate
6
Parsing: state of the art
• Full parsing is ? fragile ? slow ? inaccurate This is no longer true! • Improvements: ? robustness ? efficiency ? disambiguation
6
Parsing: state of the art
• Full parsing is ? fragile ? slow ? inaccurate This is no longer true! • Improvements: ? robustness ? efficiency ? disambiguation
• Corpora!
7
Syntactic Annotation - past
• Penn Treebank (1989) • By linguists (students) • Resource for NLP research/development: ? Train (statistical) models ? Evaluate (statistical) models • Revolution in NLP
8
Syntactic Annotation - this talk
• for Dutch • manually corrected ? By linguists (students) ? Alpino parser and related tools • fully automatically ? Alpino parser and related tools ? Huge ? Many more applications
9
Overview
Syntactically annotated corpora are great! • Small manually corrected treebanks: ? . . . for disambiguation in a parser • Huge automatically created treebanks: ? . . . for improved disambiguation in a parser ? . . . for corpus linguistics ? . . . for information extraction / question answering
10
Alpino
• Parser for Dutch • Characteristics: ? wide-coverage ? robust ? accurate • Formalism: Stochastic Attribute Value Grammar ? Linguistic Sophistication ? Principled Account of Disambiguation • Output: CGN Dependency Structures
11
CGN Dependency Structures
• CGN: Corpus of Spoken Dutch • abstract representation of syntactic analysis • de-facto standard • hierarchical information: which words belong together • relational information: hd, su, obj1, obj2, pc, . . . • categorial information: np, pp, smain, . . .
12
Vier jonge Rotterdammers willen deze zomer per auto naar Japan
top
smain
hd verb wil3
su
np
det num vier0
mod adj jong1
hd noun Rotterdammer2
det det deze4
mod
mod
ld
np
pp
pp
hd noun zomer5
hd prep per6
obj1 noun auto7
hd prep naar8
obj1 name Japan9
13
Er was een tijd dat Amerika met bossen overdekt was top
smain
mod adv er0
hd verb ben1
su
np
det det een2
hd noun tijd3
vc
cp
cmp comp dat4
body
ssub
su
1
hd verb ben9
vc
ppart
name Amerika5
obj1
mod
1
pp
hd prep met6
hd verb overdek8
obj1 noun bos7
14
Extrinsic Motivation
corpus Alpino Treebank (newspaper) CLEF questions (tuned for questions) D-Coi-Gr Treebank D-Coi WR-P-E-E (newsletters) D-Coi WR-P-P-B (children book)
# sentences
length
accuracy %
exact %
7136 1745 8857 90 276
20 11 15 20 7
89.1 96.3 88.4 81.1 93.5
41.5 82.1 48.3 31.1 79.0
Accuracy: in terms of named dependencies
15
Syntactic Analysis in Alpino
• Lexicon ? ? ? ? ?
over 200,000 entries (including many named entities) extensive set of heuristics for unseen words and word sequences mapped to attribute-value matrices organized as inheritence network POS-tagger removes unlikely lexical categories
• Grammar ? rewrite rules where categories are attribute-value matrices ? unification ? rule set organized as inheritence network • Parser ? constructs parse forest: compact representation of all possible parses ? selects best parse: disambiguation
16
10000 5000 0
Avg. readings
15000
Ambiguity in Alpino
5
10 Sentence length (words)
15
17
Ambiguity
• the expected lexical and structural ambiguities • many, many, many unexpected, absurd, ambiguities • many don’t care ambiguities • longer sentences have millions of parses
18
Er was een tijd dat Amerika met bossen overdekt was top
smain
mod adv er0
hd verb ben1
su
np
det det een2
hd noun tijd3
vc
cp
cmp comp dat4
body
ssub
su
1
hd verb ben9
vc
ppart
name Amerika5
obj1
mod
1
pp
hd prep met6
hd verb overdek8
obj1 noun bos7
19
Er was een tijd dat Amerika met bossen overdekt was
top
smain
mod adv er0
hd verb ben1
det det een2
su
predc
mod
np
np
pp
hd noun tijd3
det det dat4
hd name Amerika5
hd prep met6
obj1
np
hd noun bos7
mod
np
mod adj overdekt8
hd noun was9
20
Er was een tijd dat Amerika met bossen overdekt was
top
smain
mod adv er0
hd verb ben1
su
np
det det een2
hd noun tijd3
vc
cp
cmp comp dat4
body
ssub
su name Amerika5
predc adj overdekt8
mod
pp
hd prep met6
obj1 noun bos7
hd verb ben9
21
Vier jonge Rotterdammers willen deze zomer per auto naar Japan
top
sv1
hd verb vier0
obj1
mod
mod
np
np
pp
obj1
np
mod adj jong1
hd noun Rotterdammer2
hd verb wil3
det det deze4
hd noun zomer5
hd prep per6
obj1
np
hd noun auto7
mod
pp
hd prep naar8
obj1 name Japan9
22
Door de overboeking vertrok een groep toeristen uit het hotel top
smain
hd verb vertrek3
mod
pp
hd prep door0
obj1
np
det det de1
det det een4
hd noun over boeking2
• Zempl´eni: unambiguously literal sentence • Alpino: 13 parses
su
ld
np
pp
hd noun groep5
mod noun toerist6
hd prep uit7
obj1
np
det det het8
hd noun hotel9
23
Door de overboeking vertrok een groep toeristen uit het hotel top
smain
hd verb vertrek3
mod
pp
hd prep door0
obj1
np
det det de1
hd noun over boeking2
det det een4
su
obj1
np
np
hd noun groep5
hd noun toerist6
mod
pp
hd prep uit7
obj1
np
det det het8
hd noun hotel9
24
Disambiguation Model
• Identify features for disambiguation: arbitrary characteristics of parses • Training the model: assign a weight to each feature, by ? increase weights of features in the correct parse ? decrease weights of features in incorrect parses • Applying the model: ? For each parse, sum weights of features occurring in it ? Select parse with highest sum • Maximum Entropy
25
Training
• Requires a corpus of correct and incorrect parses • Alpino Treebank: ? ? ? ?
newspaper-part (cdbl) of Eindhoven corpus 145.000 words manually checked syntactic annotations (Leonoor van der Beek, . . . ) CGN Dependency Structures
• Generate all parses with Alpino, and use the treebank to classify each parse
26
Features
• Describe arbitrary properties of parses • Need not be independent of each other • Can encode a variety of linguistic (and other) preferences • Linguistic Insights!
27
Features templates
r1(Rule) r2(Rule,N,SubRule) r2 root(Rule,N,Word) r2 frame(Rule,N,Frame) r3(Rule,N,Word) mf(Cat1,Cat2) f1(Pos) f2(Word,Pos) h(Heur)
Rule has been applied The N-th daughter of Rule is constructed by SubRule The N-th daughter of Rule is Word The N-th daugther of Rule is a word with subcat frame Frame The N-th daughter of Rule is headed by Word Cat1 precedes Cat2 in the mittelfeld POS-tag Pos occurs Word has POS-tag Pos unknown word heuristic Heur has been applied
28
Dependency feature templates
dep35(Sub,Role,Word) dep34(Sub,Role,Pos) dep23(SubPos,Role,Pos)
Sub is the Role dependent of Word Sub is the Role dependent of a word with POS-tag Pos a word with POS-tag SubPos is the Role dependent of a word with POS-tag Pos
29
Some non-local features
• In coordinated structure, the conjuncts are parallel or not • In extraction structure, the extraction is local or not • In extraction structure, the extracted element is a subject • Constituent ordering in mittelfeld ? pronoun precedes full np ? accusative pronoun precedes dative pronoun ? dative full np precedes accusative full np
30
Features indicating bad parses
-0.0707213 -0.0585366 -0.0507852 -0.0497879 -0.0494901 -0.0411195 -0.0410466 -0.0372584 -0.0337606
h1(long) f2(was,noun) f2(tot,vg) h1(decap(not_begin)) s1(extra_from_topic) r3(np_det_n,2,was) f2(op,prep) f2(kan,noun) h1(skip)
31
Features indicating good parses
0.0741717 0.064064 0.0549897 0.0461192 0.039418 0.0387447
f2(en,vg) dep35(en,vg,hd/obj1,prep,tussen) f2(word,verb(passive)) r2(non_wh_topicalization(np),1,np_pron_weak) s1(subj_topic) dep23(pron(wkpro,nwh),hd/su,verb)
32
Results Parse Selection
• Alpino treebank • ten-fold cross-validation • Model should select best parse for each sentence out of maximally 1000 parses per sentence • accuracy: proportion of correct named dependencies
33
Results Parse Selection
baseline oracle model rate exact
accuracy % 61.5 89.2 84.0 81.5 55
34
Wrap up
So far: • background about Alpino • manually annotated treebank to train and test disambiguation component Next: • applications of automatically constructed treebanks
35
Automatically constructed treebanks
• Corpora automatically annotated with Alpino Parser ? Twente News Corpus (TwNC) (500M words, newspapers) ? D-Coi (55M words, including Dutch Wikipedia, Dutch Europarl) ? LASSY (450M words, to be decided) • Interesting Applications . . .
36
TwNC
#sentences #words #sentences without parse #sentences with fragments #single full parse
100% 0.2% 8% 92%
30,000,000 500,000,000 100,000 2,500,000 27,500,000
37
Millions of dependency structures
• Compressed archives of XML files • Pseudo random access • dictd gzip • Storage requirements: 10% of original • Mostly by Geert Kloosterman
38
Example
<node rel="top" cat="smain" begin="0" end="10"> <node rel="su" frame="determiner(het,nwh,nmod,pro,nparg)" pos="det" begin="0" end="1" root="dat" word=" <node rel="hd" frame="verb(hebben,past(sg),transitive)" pos="verb" begin="1" end="2" root="wek" word="w <node rel="obj1" cat="np" begin="2" end="10"> <node rel="det" frame="determiner(de)" pos="det" begin="2" end="3" root="de" word="de" infl="de"/> <node rel="hd" frame="noun(de,both,sg)" pos="noun" begin="3" end="4" root="woede" word="woede" gen="d <node rel="mod" cat="pp" begin="4" end="10"> .... <sentence>Dat wekte de woede van Turkse inwoners van de wijk . Q#AD19940103-0125-776-2|Dat wekte de woede van Turkse inwoners van de wijk .|1|1|-0.0396969573
39
Treebank Tools
• DtView • DtEdit • DtSearch
40
DtView
41
DtSearch • XPATH standard • Search queries ? ? ? ? ?
hierarchical relations grammatical relations syntactic category surface order lemma, other attributes
• Matches: ? ? ? ?
display sentence display sentence with brackets display matching part of sentence your own style-sheets
42
DtSearch Example
dtsearch -s -q ’//node[../@cat="smain" and @rel="obj2" and not(@cat="pp") and ./@begin = ../@begin]’
.
[Haar] ging het goed af . " [Ons] staat helemaal geen Big Brother-scenario voor ogen . [Ook hun] past enige schroom . [Zelfs de bloeddorstigste tegenstander] adviseerde hij nog zijn gedrag wat aan te passen . [Die] geef ik voor de wedstrijd een zoen ...
43
Application: Selection Restrictions for Improved Disambiguation
43
Application: Selection Restrictions for Improved Disambiguation
• Use automatically parsed corpus to learn selection restrictions • Bier drinkt de vrouw Beer, the woman drinks • Lexical features: ? dep35(woman,obj1,drink) dep35(beer,su,drink) ? dep35(woman,su,drink) dep35(beer,obj1,drink) • Such features are too infrequent to be useful; the training corpus is too smal to estimate weights for those features
44
Some Actually Occurring Bad Parses
(1)
a.
b.
c.
Campari moet u gedronken hebben Campari must have drunk you You must have drunk Campari De wijn die Elvis zou hebben gedronken als hij wijn zou hebben gedronken The wine Elvis would have drunk if he had drunk wine The wine that would have drunk Elvis if he had drunk wine De paus heeft tweehonderd daklozen te eten gehad The pope had twohunderd homeless people for dinner
45
Extract lexical dependencies
top
whq whd
body
1 conj
cnj adv waar0
crd vg en1
sv1
cnj adv wanneer2
mod
1
hd verb drink3
su name Elvis4
crd/cnj(en, waar) crd/cnj(en, wanneer) whd/body(en, drink) hd/mod(drink, en) hd/obj1(drink, wijn) hd/su(drink, Elvis)
obj1 noun wijn5
46
Number of lexical dependencies
tokens types types with frequency ≥ 20
480,000,000 100,000,000 2,000,000
47
Bilexical preference
• Pointwise Mutual Information (Fano 1961, Church and Hanks 1990) I(r(w1, w2)) = log
f (r(w1, w2)) f (r(w1, ))f ( ( , w2))
• compare actual frequency with expected frequency • Example: I(hd/obj1(drink, melk)) ? ? ? ? ? ?
f (hd/obj1(drink, melk)): 195 f (hd/obj1(drink, )): 15713 f ( ( , melk)): 10172 expected: 0.34 actual frequency is about 560 times as big its log: 6.3
48
Examples of high bilexical preferences
bijltje duimschroef peentje traantje boontje centje champagne fles dorst
gooi neer draai aan zweet pink weg dop verdien bij ontkurk les
13 13 13 13 12 12 12 12
49
Examples of high scoring objects of drink
biertje borreltje glaasje pilsje pintje pint wijntje alcohol bier
small glass of beer strong alcoholic drink small glass small glass of beer small glass of beer glass of beer small glass of wine alcohol beer
8 8 8 8 8 8 8 7 7
50
Lexical preferences between verbs and modifiers
overlangs welig dunnetjes stief moederlijk on zedelijk stierlijk cum laude hermetisch ingespannen instemmend kostelijk
snijd door tier doe over bedeel betast verveel studeer af grendel af tuur knik amuseer
12 12 11 11 11 11 10 10 10 10 10
51
Lexical preferences between nouns and adjectives
endoplasmatisch zelfrijzend waterbesparende ongeblust onbevlekt ingegroeid knapperend geconsacreerde bezittelijk pientere afgescheurde beklemtoond
reticulum bakmeel douchekop kalk ontvangenis teennagel haardvuur hostie voornaamwoord pookje kruisband lettergreep
52
Can you guess?
put
52
Can you guess?
put sponde
bodemloze
52
Can you guess?
put sponde bandiet
bodemloze echtelijke
52
Can you guess?
put sponde bandiet zelfverrijking
bodemloze echtelijke eenarmige
52
Can you guess?
put sponde bandiet zelfverrijking vuist
bodemloze echtelijke eenarmige exhibitionistische
52
Can you guess?
put sponde bandiet zelfverrijking vuist wenkbrauw
bodemloze echtelijke eenarmige exhibitionistische gebalde
52
Can you guess?
put sponde bandiet zelfverrijking vuist wenkbrauw nonsens
bodemloze echtelijke eenarmige exhibitionistische gebalde gefronst
52
Can you guess?
put sponde bandiet zelfverrijking vuist wenkbrauw nonsens veldtocht
bodemloze echtelijke eenarmige exhibitionistische gebalde gefronst baarlijke, klinkklare
52
Can you guess?
put sponde bandiet zelfverrijking vuist wenkbrauw nonsens veldtocht
bodemloze echtelijke eenarmige exhibitionistische gebalde gefronst baarlijke, klinkklare tiendaagse
53
Using association scores as disambiguation features
• new features z(p, r) for each POS-tag p and dependency r
53
Using association scores as disambiguation features
• new features z(p, r) for each POS-tag p and dependency r • if there is a r-dependency between word w1 (with Pos-tag p) and word w2
53
Using association scores as disambiguation features
• new features z(p, r) for each POS-tag p and dependency r • if there is a r-dependency between word w1 (with Pos-tag p) and word w2 • the count of this feature is given by I(r(w1, w2))
53
Using association scores as disambiguation features
• new features z(p, r) for each POS-tag p and dependency r • if there is a r-dependency between word w1 (with Pos-tag p) and word w2 • the count of this feature is given by I(r(w1, w2)) • only for positive I
53
Using association scores as disambiguation features
• new features z(p, r) for each POS-tag p and dependency r • if there is a r-dependency between word w1 (with Pos-tag p) and word w2 • the count of this feature is given by I(r(w1, w2)) • only for positive I • NB: limited number of features; treebank large enough to estimate their weights
54
Example
• Melk drinkt de baby niet Milk, the baby does not drink • Analysis 1: ? z(verb,hd/obj1)=6 ? z(verb,hd/su)=3 • Analysis 2: ? z(verb,hd/obj1)=0 ? z(verb,hd/su)=0 • ? weight z(verb,hd/obj1): 0.0101179 ? weight z(verb,hd/su): 0.00877976
55
Experiment 1
• ten-fold cross validation Alpino Treebank
standard +self-training
fscore % 87.41 87.91
err.red. % 74.60 77.38
exact % 52.0 54.8
CA % 87.02 87.51
56
Experiment 2
• Full system D-Coi Treebank (Trouw newspaper)
standard +self-training
prec % 90.77 91.19
rec % 90.49 90.89
fscore % 90.63 91.01
CA % 90.32 90.73
57
Application: Extraposition of comparatives out of topic
57
Application: Extraposition of comparatives out of topic
• Reviewer: extraposition of comparative out of topic is impossible: *Lager was de koers nog nooit dan bij opening • Alpino grammar allows this • We can search for the relevant pattern
58
Dependency Structure
top
smain
hd verb ben1
predc
ap
hd adj laag0
cp
body
pp
hd prep bij7
np
det det de2
obcomp
cmp comparative dan6
su
obj1 noun opening8
hd noun koers3
mod adv nog4
mod adv nooit5
59
DtSearch queries
//node[@cat="smain" and ./node[./node[@rel="obcomp"]]/@begin = @begin] //node[@cat="smain" and ./node[./node[@rel="obcomp"] /@end > ../node[@rel="hd"]/@begin ]/@begin = @begin]
60
Extraposed obcomp out of topic
Liever benadrukt hij die tegenstellingen dan de bedriegelijke harmonie Nog eerder zal de machtige Mekong droogvallen dan dat de co-premier zijn macht uit handen geeft Zo intens lelijk zijn mijn voeten in de loop van een decennium geworden dat ik de mensenmassa’s op het strand er in de zomer niet mee wil lastigvallen Eerder brengt men een hemel vol wolken in kaart dan dit oeuvre Veel eerder vindt er een herschikking in het midden plaats dan dat er werkelijk massaal uit dat midden wordt gevlucht Eerder is er sprake van het kabinet ’ ondanks Kok ’ dan het ’ kabinet-Kok Liever sluis ik honderden en honderden guldens door aan loodgieter , fietsenmaker en elektricien dan dat ik zelf ook maar ´ e´ en vinger uitsteek naar het fonteintje bij het toilet , een kapot achterlicht of een weigerende stofzuiger liever waren ze onafhankelijk dan dat ze zich aan iemand bonden Liever is Jim schuldig aan een sprong , dan de prooi van een aanvechting
61
eerder gaat zoo’n kameel door het oog van een naald, dan dat een rijke in zou gaan in het koninkrijk der hemelen
62
Application: Question Answering, and Similar Words
62
Application: Question Answering, and Similar Words
(2)
By whom was John Lennon killed?
(3)
Where was he killed?
(4)
How often was he hit?
(5)
What are Google-bombs?
(6)
How high is the Dom-tower in Utrecht
(7)
In what year did its construction start?
(8)
Who was the first architect?
63
Background • QA-system based on Alpino: JOOST • Best result in CLEF2005 for Dutch; third result overall • Best result in CLEF2006 for Dutch; Dutch was made more difficult than other languages • No results known yet for CLEF2007
63
Background • QA-system based on Alpino: JOOST • Best result in CLEF2005 for Dutch; third result overall • Best result in CLEF2006 for Dutch; Dutch was made more difficult than other languages • No results known yet for CLEF2007
64
Strategy
• Analyse the question into a dependency structure • Compare dependency structure with dependency structures of all potential answers • Potential answers are paragraphs returned by IR from newspaper texts and Dutch Wikipedia
64
Strategy
• Analyse the question into a dependency structure • Compare dependency structure with dependency structures of all potential answers • Potential answers are paragraphs returned by IR from newspaper texts and Dutch Wikipedia • Use many other techniques in addition ? Ontological information
65
Ontological information for QA
(9)
Who is Javier Solana?
(10)
Which soccer player won the “Golden Bal” in 1999?
(11)
In which American state is Iron Mountain?
(12)
Which French president opened the Channel Tunnel?
66
Discover Ontological Information
• Similar words occur in similar contexts • Dependency relations: more fine-grained notion of context ? ? ? ? ? ?
subject-verb verb-object adjective-noun coordination apposition prepositional complement
67
Vectors describing contexts
• Every word is represented by an n-dimensional vector • Every dimension is a context characteristic • Every cell is a (function of the corresponding) frequency
bus hond truck ..
zie.obj 50 56 43
verf.obj 5 1 4
verzorg.obj 1 5 0
laat uit.obj 0 8 0
... ... ... ...
68
Similarity Measure
• Dice: X i
• other possibilities . . .
2·
min(vi, wi) vi + wi
69
Feature Weights
• frequency • mutual information • other possibilities . . .
70
Data used
subject-verb verb-object adjective-noun coordination apposition prepositional complement
5,639,140 2,642,356 3,262,403 965,296 526,337 770,631
71
Results for BMW
Volkswagen, Mercedes, Honda, Chrysler, Audi, Volvo, Ford, Toyota, Fiat, Peugeot, Opel, Mitsubishi, Renault, Mazda, Jaguar, General Motors, Rover, Nissan, VW, Porsche
72
Results for Sony
Matsushita, Toshiba, Time Warner, JVC, Hitachi, Nokia, Samsung, Motorola, Philips, Siemens, Apple, Canon, IBM, PolyGram, Thomson, Mitsubishi, Kodak, Pioneer, AT&T, Sharp
73
Hinault
K¨ ubler, Vermandel, Bruy`ere, Depredomme, Mottiat, Merckx, Depoorter, De Bruyne, Argentin, Schepers, Criquielion, Dierickx, Van Steenbergen, Kint, Bartali, Ockers, Coppi, Fignon, Kelly, De Vlaeminck
74
Beatles
Rolling Stones, Stones, John Lennon, Jimi Hendrix, Tina Turner, Bob Dylan, Elvis Presley, Michael Jackson, The Beatles, David Bowie, Prince, Genesis, Mick Jagger, The Who, Elton John, Barbra Streisand, Led Zeppelin, Eric Clapton, Diana Ross, Janis Joplin
75
Paris
Londen, Brussel, Moskou, Washington, Berlijn, New York, Rome, Madrid, Bonn, Wenen, Peking, Frankfurt, Athene, Tokio, M¨ unchen, Barcelona, Praag, Antwerpen, Stockholm, Tokyo
76
Grenoble
Rouen, Saint Etienne, Pau, Saint-Etienne, Rennes, Marne-la-Vall´ee, Aix, Orl´eans, Toulouse, Montpellier, Amiens, Strasbourg, Lyon, Lens, Avignon, Clermont-Ferrand, Straatsburg, Caen, Bayonne, Limoges
77
Results for Wim Kok
Elco Brinkman, Frits Bolkestein, Hans van Mierlo, W. Kok, Kok, Ruud Lubbers, Den Uyl, John Major, Jacques Wallage, Wallage, Thijs W¨ oltgens, Hedy d’Ancona, Relus ter Beek, Klaus Kinkel, Balladur, Kinkel, Van Mierlo, Jacques Chirac, Kooijmans, Jan Pronk
78
huis (house)
woning, gebouw, pand, auto, straat, kantoor, kamer, boerderij, tuin, winkel, kerk, brug, huisje, appartement, hotel, flat, muur, boom, paleis, villa
house, building, house, car, street, office, room, farm, garden, shop, church, bridge, small house, appartment, hotel, flat, wall, tree, palace, villa
79
verliefdheid (enamour, love)
jaloezie, verraad, afgunst, weerzin, romance, hartstocht, overspel, passie, erotiek, vriendschap, obsessie, schuldgevoelen, fascinatie, vergankelijkheid, seksualiteit, animositeit, seks, lust, verlangen, zeeroof
jealousy, treason, envy, dislike, romance, passion, adultery, passion, erotics, friendship, obsession, feelings of guilt, fascination, transiency, sexuality, animosity, sex, lust, desire, piracy
80
witlof
broccoli, prei, spruitje, knolselderij, andijvie, courgette, sperzieboon, zuurkool, worteltje, bleekselderij, bloemkool, snijboon, aubergine, peen, zilveruitje, ijsbergsla, koolsoort, winterpeen, doperwtjes, komkommer
broccoli, leek, sprout, celeriac, endive, zucchini, butter bean, sauerkraut, carrot, blanched celery, cauliflower, haricot, aubergine, carrot, onion, iceberg lettuce, cabbage, carrot, peas, cucumber
81
Conclusion
Syntactically annotated corpora are perhaps potentially somewhat useful
82
It’s Free!
http://www.let.rug.nl/vannoord/alp/Alpino/ http://www.let.rug.nl/vannoord/trees/ http://www.let.rug.nl/vdplas/Sets/browse.php http://www.let.rug.nl/gosse/Sets/