Translation-driven mapping of semantic fields of inceptiveness Using bidirectional parallel corpus data for measuring and visualizing distances between lexemes in the semantic field of inceptiveness Lore Vandevoorde, Gert De Sutter & Koen Plevoets
Research goal To operationalize and to measure semantic differences in translated vs non-translated language
• Does the semasiological structure of polysemous and near-synonymous lexemes in translated Belgian Dutch differ from their structure in original Belgian Dutch? • What is the role of source language and register? pag. 2
Today’s research question Can we find a technique that enables us to retrieve and visualize semantic fields AND in addition allows for:
register-sensitive comparison; cross-linguistic comparison; intra-linguistic comparison (translated vs. original language)?
pag. 3
CORPUS: Dutch Parallel Corpus -
Ten-million-word Sentence aligned Parallel and comparable Balanced for 5 text types (= registers / genres) -
-
External communication, journalistic texts, instructive texts, administrative texts, literature
Balanced for 4 translation directions -
Dutch -> French, French -> Dutch Dutch -> English, English -> Dutch pag. 4
TECHNIQUE: Semantic Mirroring Semantic Mirroring A technique for meaning differentiation that uses translational data from parallel corpora.
Dyvik, H. (1998). A translational basis for semantics. In S. Johansson & S. Oksefjell (Eds.), Corpora and cross-linguistic research: theory, method, and case studies (pp. 51-86). Amsterdam: Rodopi. Dyvik, H. (2004). Translations as semantic mirrors from parallel corpus to Wordnet. In K. Aijmer & B. Altenberg (Eds.), Advances in Corpus Linguistics (pp. 311-326). Amsterdam & New York: Rodopi.
pag. 5
Semantic Mirroring Advantages Not predetermined selection of lexical items Expansive: create semantic field
Dyvik (2004) “semantically closely related words ought to have strongly overlapping sets of translations” Overlapping sets of translations should reveal semantic relations
pag. 6
Semantic Mirroring
pag. 7
Semantic Mirroring: example Initial lexeme: bank
First T-image: English translations for bank bench, bank, desk
pag. 8
Semantic Mirroring: example Inverse T-image Dutch translations for bench zitbank, bank
Dutch translations for bank oever, geldbedrijf, bank
Dutch translations for desk werktafel, schoolbank pag. 9
Case study semantic field of inceptiveness in Dutch (BEGINNEN) and French (COMMENCER)
pag. 10
Initial lexeme
Select a concise set of nearsynonyms using lexicographic data and substitution testing
pag. 11
Initial lexeme
Look up synonyms of beginnen in 8 dictionaries (no meta-information)
19 / 104 synonyms in at least 3 dictionaries
Substitution testing for 19 synonyms
Set of 5 synonyms
pag. 12
Initial lexeme Beginnen Aanvangen Starten Een aanvang nemen Van start gaan Aanvatten
pag. 13
Apply semantic mirroring
pag. 14
Create First T-image
Check all French translations of the SET of variants of BEGINNEN
74 different translations
Minimal overlap criterion + frequency threshold 5
T-image of 12 French lexemes
pag. 15
Apply semantic mirroring
Query T-image lexemes from the corpus
Check translations back into Dutch
Apply minimal overlap criterion + frequency threshold 5
Inverse T-image of 22 Dutch lexemes
pag. 16
SELECTION
22 Dutch lexemes of inceptiveness aanvang-
invoeren
aanvangen
lanceren
aanvankelijk
openen
aanvatten
oprichten
begin
opstarten
begin-
start
beginnen
start-
eerst (adj)
starten
gaan
van start gaan
ingaan
vanaf
instellen
vertrekken
pag. 17
Insert frequencies
VISUALIZATION
BUT
Inverse T-image = TL frequencies
Additional step
pag. 18
Visualize Data are summarized in frequency tables commencer début débuter démarrer départ aanvang0 4 0 0 aanvangen 0 1 3 1 aanvankelijk 1 9 0 0 aanvatten 1 0 0 2 begin 2 264 2 0
…
entamer instaurer lancer 0 0 0 0 1 0 3 0 0 0 8 1 9 0 0
lancer, se ouvrir 0 0 0 0 0 0 0 0 1 0
prendre cours
partir 0 0 0 0 0
0 1 0 0 0
0 3 0 0 0
Correspondence Analysis a statistical technique designed for the visualization of frequency tables (Greenacre, 2007) pag. 19
2 1
aanvangbegin beginaanvankelijk
openen
0
2
startstart eerst (adj)
-1
aanvangen vertrekken ingaanbeginnen gaan starten vanaf van start gaan aanvatten opstarten lanceren oprichten
-2
invoeren instellen
n = 5323
-3
-2
-1 1
0
1
pag. 20
First-order Correspondence Analysis Same word class lexemes seem to cluster together This would imply that: - aanvankelijk [firstly] and begin [beginning] are semantically closer to each other than begin [beginning] to beginnen [to begin]
pag. 21
Solution?
Insert earlier stages of the Semantic Mirroring into the analysis Exclude possible interaction with other semantic fields
Base the semantic space on the six lexemes and their first-order translations
pag. 22
Second-order Correspondence Analysis Anchor 12 points (French lexemes) with respect to the 6 Dutch lexemes (Initial lexeme)
pag. 23
1 2
-1
0
lancer, se commencer départ beginnen début ouvrir starten partir lancer entamer démarrer débuter van start gaan instaurer aanvatten
-2
aanvangen prendre cours
-4
-3
een aanvang nemen
-3
-2
-1
0 1
1
2
pag. 24
Second-order Correspondence Analysis Visualize the 22 Dutch lexemes based on the 12 anchored points (“supplementary” points)
pag. 25
1 0
eerst (adj) beginnen lancer, se commencer aanvankelijk départ startaanvangstart openen begin début ouvrir begin- gaan starten partir vertrekken lancer entamer démarrerlanceren débuter instaurer van start gaan opstarten oprichten
2
-1
aanvatten invoeren instellen
-2
prendre cours
-3
aanvangen
-4
vanaf ingaan
-3
-2
-1
0 1
1
2
pag. 26
1 0
eerst (adj) beginnen aanvankelijk startaanvangstart openen begin begin- gaan starten vertrekken lanceren van start gaan opstarten oprichten
-2
2
-1
aanvatten invoeren instellen
-3
aanvangen
-4
vanaf ingaan
n = 5323
-3
-2
-1
0 1
1
2
pag. 27
- Separate clusters are now meaningbased and formed independently of the word class of each lexeme - Lexemes follow a descending line towards more outlying, more formal lexemes
pag. 28
Semantic field of BEGINNEN: journalistic texts
pag. 29
1
Journalistic texts
eerst (adj) beginnen
0
start aanvankelijk beginopenen begin startgaan vertrekken starten oprichten lanceren invoeren van start gaan
-3
-2
2
-1
opstarten
-4
ingaan
n = 1468
-3
-2
-1
0 1
1
2
pag. 30
Semantic field of BEGINNEN: external communication
pag. 31
1
External communication
beginnen
0
startbegin start eerst begin(adj) openen vertrekken starten
aanvangen lanceren
-1
gaan van start gaan
2
aanvatten invoeren
-4
-3
-2
opstarten
n = 928
-3
-2
-1
0 1
1
2
pag. 32
Register-specific plots provide us with interesting insights into the influence of register on the structure of semantic fields
pag. 33
Cross-lingual comparison BEGINNEN and COMMENCER
pag. 34
0.5
1.0
partir
débuter
0.0
lancer démarrer mettre sur pied
lancement
départ créer lancer, se
commencer ouvrir début
-0.5
entamer
-1.0
2
entreprendre
-1.0
-0.5
0.0 1
0.5
1.0
pag. 35
Intra-lingual comparison translated and original BEGINNEN
pag. 36
original 1
1
translated beginnen startaanvankelijk aanvangeerst (adj) start gaan begin openen begin-
0
0
eerst (adj) beginnen aanvankelijk startaanvangstart openen begin begin- gaan starten vertrekken
vanaf vertrekken oprichten lanceren starten opstarten van start gaan aanvatten
lanceren van start gaan opstarten oprichten
-1
invoeren instellen aanvangen
-2
-2
2
-1
aanvatten invoeren instellen
-3
-3
aanvangen
ingaan
-4
-4
vanaf ingaan
-3
-2
-1
0 1
1
2
-3
-2
-1
0 1
1
2
pag. 37
- Many similarities between the translated and original semantic field - Noteworthy differences: To what extent can translated language represent original language? pag. 38
Conclusion The developed translational method is bottom-up, quantitative, corpus based and enables us to measure and visualize translated and original semantic fields, even register-specifically, which was our main methodological concern. BUT: - Extra testing with a different pivot language - Carry out a Semantic Mirroring with a single lexeme
pag. 39
Questions for the future If we are able to visualize semantic structures (including register-specific ones) in translated and non-translated texts, then we can aim at providing evidence for the Gravitational Pull Hypothesis (Halverson 2003, 2007).
Apply semantic mirroring to a field with more salient semantic differences pag. 40
Possibilities for the future - Automatization of the retrieval process (using word-aligned corpora) - Add morpho-syntactic ID-tags to the translation annotated sentences
pag. 41
Thank you!
[email protected] [email protected] [email protected]
pag. 42