Finding a categorisation in references from case-law to legislation A.R. (Alexander) van Someren 10169547
Bachelor thesis Credits: 18 EC Bachelor Opleiding Kunstmatige Intelligentie University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam
Supervisors Dhr. dr. R.G.F. (Radboud) Winkels Dhr. dr. A.W.F. (Alexander) Boer Leibniz Center for Law Faculty of Law University of Amsterdam Science Park 904 1098 XH Amsterdam
June 27th, 2014
2
Abstract This thesis investigates whether it is possible to categorise references from case law to legislation. Since literature on this subject is scarce, a data driven approach is proposed to see whether properties that describe the references give these references a natural inclination to cluster. A parser was built to extract the data of references from Dutch case law. From this data, features were identified and were used as the basis for the analysis. Using an unsupervised clustering algorithm (expectation maximisation), it was found that such a natural categorisation indeed exists. This provides a proof of principle, but leaves a legal interpretation open for further research.
3
4
Acknowledgements I would like to thank Antoinette van Muntjewerff, Marco Loos and Ronald Beltzer for their time and expertise. Furthermore I would like to thank my supervisors, Radboud Winkels and Alexander Boer for their guidance and help. Finally I would like to thank Bart Vredebregt a.k.a. the regex king and Tony, who provided a moment of calmth when Bart and I needed it.
5
6
Contents 1
Introduction
2
Finding and resolving references 2.1 Raw Data . . . . . . . . . . . . . . . . . . . . 2.2 Parser . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Regular Expression . . . . . . . . . . . 2.2.2 Resolvement based on previous finding 2.3 Evaluation . . . . . . . . . . . . . . . . . . . . 2.3.1 Finding references . . . . . . . . . . . 2.3.2 Resolving references . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
13 13 14 14 15 16 16 16
Method 3.1 Available data . . . . . . . . 3.2 Feature selection . . . . . . 3.2.1 Related work . . . . 3.2.2 Considered features . 3.2.3 Applied features . . 3.3 Clustering . . . . . . . . . . 3.4 χ2 Attribute Evaluation . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
19 19 19 19 21 25 27 27
. . . . . . . .
29 29 29 30 30 31 31 31 32
3
4
Results 4.1 Test run . . . . . . . 4.1.1 Data . . . . . 4.1.2 χ2 Evaluation 4.1.3 Interpretation 4.2 Final run . . . . . . . 4.2.1 Data . . . . . 4.2.2 χ2 Evaluation 4.2.3 Interpretation
9
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . .
. . . . . . . . 7
. . . . . . .
. . . . . . . .
. . . . . . .
. . . . . . . .
. . . . . . .
. . . . . . . .
. . . . . . .
. . . . . . . .
. . . . . . .
. . . . . . . .
. . . . . . .
. . . . . . . .
. . . . . . .
. . . . . . . .
. . . . . . .
. . . . . . . .
. . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
8 5
CONTENTS Conclusion 5.1 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Suggestions for further research . . . . . . . . . . . . . . . . . .
39 39 40 40
Appendices
45
A Keyword Regular Expressions
47
B Raw results B.1 Test round . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.2 Final round . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49 49 51
C Example of generated Ecli file
57
Chapter 1
Introduction Finding relevant legal sources is a difficult and time-consuming task for jurists, partly because there is a variety of sources available such as legislation, case law and a number of (scientific) interpretations. This adds up to an enormous number of possibilities. Another reason is that it is difficult to determine the relevance of a legal source and the relevance can differ for each person as can be the tasks for which the jurist needs the source. Currently, jurists use legal journals to simplify this task. Employing the expertise of jurists and journalists that follow legal matters closely, these journals select and publish legal sources that they deem relevant. This selection naturally decreases the amount of available sources, making the task of identifying relevant ones for a specific goal less labourious. With the increasing availability of open sources such as case-law, it is suddenly possible for a jurist to present a source in court that was never discussed in journals. This provides a new dimension in law. However, these documents that are available are neither selected on relevance nor categorised on some level. This lays the ground for research into systems that assist jurists with the task of identifying relevant sources. Several suggestions for such systems have been offered. They differ in terms of the type of source of which the relevance should be determined. Mazzega et al. (2009) have performed network analysis on French legislation, using references within legislation as nodes in the network. This reference-network approach motivated Winkels et al. (2013) to apply analogous analysis on Dutch legislation and to use centrality measurements as relevance indicators for (other) legislation given one article that is in focus. 9
10
CHAPTER 1. INTRODUCTION
Fowler et al. (2007) used a similar method on case law1 . By analysing a network of 27,681 opinions written by the U.S. Supreme Court, they tried to assess the relevance of the opinions. They found that information in the network can be used to outperform conventional methods that have been used to define case relevance. Earlier, Jackson et al. (2003), designed a system which they named “History Assistent”. This system identifies relevant preceding cases when a new case is provided. It uses a combination of the citation network and natural language processing techniques, such as sentence-similarity for the title and the occurrence of direct history language (such as “affirmed” or “motion denied”). Jackson et al. (2003) concluded (among other things) that it can be useful to combine symbolic techniques with statistical approaches. In the Netherlands, Winkels et al. (2011) similarly studied whether the reference network in Dutch case law could be used to identify relevant court opinions. They discovered that this was indeed possible, but (due to the Dutch functioning of legislation) was limited to higher courts. Van Opijnen (2014, chap. 7) introduces a Model for Automated Rating of Case law (MARC). In common with the studies mentioned above, van Opijnen (2014) also looks at the reference-network between cases. However, only 8% of Dutch court opinions is cited once or more, therefore it was necessary to use other features are as well. Among these are: the type of court (with higher or lower competence); the age of the court opinion; the field of law; the length of the court opinion and the references to legislation. This thesis will focus on the latter one. Apart from what van Opijnen (2014) writes, no literature has been found about references from case-law to legislation. A simple reason for this could be that the right data is simply unavailable. At least in the Netherlands, there exists no open-data containing case-law with labelled and resolved references. The first part of this thesis will therefore describe a parser that finds and resolves references in case-law and which, therefore makes this data available. A second possible reason for the shortage of literature on this subject can be posited after consulting several experts in the case-law field. Although they proclaimed that references could serve different goals and that some were more important or more typical for a specific case, they could neither find any literature to support this nor remember it occurring in their academic education. Arguably, this is because identifying typical or important references in case-law comes naturally to jurists. From the perspective of investigating the possibilities of systems that assist jurists with the task of identifying relevant sources, it could be interesting to 1 In the US, case law is a much more important source of law compared to e.g. that of the Netherlands, where a judge rarely refers to jurisprudence.
11 discover whether this identification that comes naturally to jurists could be done by a computer. Being able to categorise references automatically could advance network analysis similar to the studies described above could since nodes in the citation network could now be of different types (i.e. carry different labels). These two possible reasons indicate that it is a fertile part of this area of research (i.e. interdisciplinary research into applying computer-science and artificial intelligence techniques to law). However, a third reason could be that a categorisation as mentioned above is not possible at all, i.e., that it simply does not exist. The fact that jurists are able to select the most relevant references could be explained by the fact that these jurists have studied the laws that the reference points to and know which of those are interesting or debatable. However, this thesis will explore the first two reasons by investigating whether an automatic categorisation of references could be possible. Therefore it is first necessary to find the references in the case law and to resolve these references. This was performed using a parser based on a regular expression. Chapter 2 will explain this method in more detail. This thesis will only look at references from case-law within the subject immigration. Only one field was selected to simplify the parsing task (different judges use different styles). The subject “immigration” was chosen because it connected well to other research conducted at the Leibniz Centre for Law. Subsequently, chapter 3 will cover the search for a categorisation of these references. It will discuss what properties of those references can be used (i.e. feature selection) and how those features can be used to find a natural categorisation. Since no literature was found on different types of references, a bottom up approach is employed. Using the selected features, an Expectation Maximisation algorithm (an unsupervised clustering algorithm) will be applied to find a possible natural structure in the data. If the data will cluster, that phenomenon in itself would be proof that there exist different groups of references. This fact is only useful when those clusters are also clearly distinguishable. The most descriptive properties of those features will be discussed in section 4.2. A legal interpretation of the clusters could yield a better comprehension of different types of references and thus lay the ground for new research into the reference structures within legal sources. Additionally this could provide a new dimension for network analysis by using the clusters as labels for the edges in such networks. However, this legal interpretation will not be performed in this thesis, intstead it will only focus on the data aspect and it will present the findings in a way that jurists should be able to comprehend.
12
CHAPTER 1. INTRODUCTION
Chapter 2
Finding and resolving references 2.1
Raw Data
The data that was used for this research was collected from www.rechtspraak .nl, which is the official website of Dutch courts. The data are XML documents describing a court opinion. Apart from the actual text, which is available per paragraph (within <para>-tags), the documents contain some meta-data. Three elements of these meta-data that are relevant for this research will be summarised here: •
This element contains the date of the verdict. • This element contains the field or fields of law the case concerns. • This element contains the name of the court that gives the verdict.
The (text of the) court opinion contains references to laws. Since the goal of this research is to find a categorisation of references, it will be necessary to have those references in a computer readable format. To achieve this, a method was implemented that (1) finds references in the court opinions, (2) find the URL and identifier that the reference refers to and (3) add those to back to the XML document in the same style (and according to the same standards) as the original document. The next section (section 2.2) will discuss how the references were found and resolved (i.e. assign a link to the law document the reference refers to). Subsequently section 2.3 will discuss the performance of this proposed parser. How the 13
14
CHAPTER 2. FINDING AND RESOLVING REFERENCES
references are added back to the original document is not relevant for this thesis and will therefore not be discussed.1
2.2
Parser
The process of finding and resolving references works as follows. Given a textual document (the court opinion), the algorithm2 identifies the references in the text. The regular expression uses different matching groups to match different parts of the reference (see section 2.2.1). Subsequently, the name of the law will be looked up in a dictionary that contains the law-identification (BWB) number and a set of names and abbreviations that are commonly used by judges3 . The law-identification number will be used resolve the reference by generate a URL. In the process of finding and resolving references, high precision is more important than high recall. Since the ultimate goal is trying to find natural categories in references, lower precision will result in noisier data, which will harm the performance of our clustering algorithm. Lower recall only limits the available data, but since there are 13.311 court opinions available adding up to more than 100.000 references, a smaller recall will not really be a problem. However, there is still aimed at a high recall since it could be that the algorithm misses all references of a certain category. This would evidently harm the research.
2.2.1
Regular Expression
After inspecting a sample of documents, the conclusion was drawn that judges are consequent in the way they refer to laws. Therefore, a regular expression was used as parser for finding references and extracting parts of those references. The regular expression is as follows (see listing 2.1): Listing 2.1: Regular Expression for capturing references and sub parts regex = ( # Matches Artikel and captures the number (and letter) # combination for the article ’([ˆa-zA-Z](?:(?:[Aa]rtikel|[Aa]rt\\.?) ([0-9]’ ’[\(0-9a-zA-Z:.\)]*)|[Bb]oek ([0-9][\(0-9a-zA-Z:.\)]*)|’ ’[Hh]oofdstuk ([0-9][\(0-9a-zA-Z:.\)]*)),?’ 1
However, a result can be found in appendix C. The python program that follows this method was written in collaboration with Bart Vredebregt. Its sourcecode can be found at https://github.com/b8vrede/cite parser/ 3 The dictionary can be found at https://github.com/b8vrede/cite parser/ blob/master/BWBIdList.xml 2
2.2. PARSER
15
# matches "lid .. (tot en met ...)" ’((?:\s+(?:lid|aanhef en lid|aanhef en onder|onder)?’ ’(?:[0-9a-z ]|tot en met)+,?’ # matches a word followed by "lid" e.g. "eerste lid" ’|,? (?:[a-z]+ lid|[a-z]+ en [a-z]+ lid),?)*)’ # captures "onderdeel ..." ’(,? onderdeel [a-z],?)?’ # captures "sub ..." ’(,? sub [0-9],?)?’ # matches e.g. "van de wet " ’(?:(?: van (?:de|het|)(?: wet)?|,?)? *’ # matches the Title ’((?:(?:wet|bestuursrecht|Wetboek van|op het ’ ’[A-Z0-9][a-zA-Z0-9]*|[A-Z0-9][a-zA-Z0-9]*)’ ’(?:[ˆ\S\n]*|\\.))+))? *’ # matches anything between () after the title ’(?:\(([ˆ\)]+?)\))?)’ ) The first matching group is very important. If a reference does not contain the words “article” (“artikel”), “book” (“boek”) or “chapter” (“hoofdstuk”), it will not be found by this parser. This will not be a big problem since judges rarely refer to a law without further specification. Furthermore, the title is very important. As mentioned, that title is matched against the dictionary that contains the BWB-numbers. Using that BWB-number and the article information from the first match, a URL is composed that points at the specific article in the law. If the algorithm does not succeed in finding a match, it will also try to use the last capturing group, which captures anything within brackets after the reference. This logic was added to capture abbreviations that are often present after a reference to a law with a long name. Since the BWB-dictionary also contains abbreviations, this will increase the recall of the algorithm.
2.2.2
Resolvement based on previous finding
When this parser was being constructed, it appeared that judges were not always consequent in their style of referring, specifically when they referred to the same law twice in one paragraph. An example could be the following: “Persuant to article 7 EVRM ... this is contrary to article 25 of that law”. “That law” is obviously not present in the BWB dictionary. It is therefore desirable to let such references copy the BWB-number from the
16
CHAPTER 2. FINDING AND RESOLVING REFERENCES
previous match. To solve this problem, a rule was implemented that lets the current reference inherit the BWB-number when the following constraints are met: (1) a previous law exists, (2) the previously found reference was resolved (3) either no law matched in the regular expression or an element of a “blacklist” matched the found law. This blacklist was manually composed and contains the following keywords: “deze wet”, “nederland”, “onze minister”, “onze”, “wet”, “nederlanders”, “nederlanderschap”, “kroon”, “koninkrijk”. The next section will discuss the performance of these methods.
2.3
Evaluation
The performance of this parser was measured on two levels; the first level being the performance of finding the references, the second being the performance of resolving the references. Both will be discussed here.
2.3.1
Finding references
For the finding aspect, 21 court opinions were randomly selected. In those, all references were manually identified. With the term reference, a reference to a law, treaty or guideline is meant. “Article 5 and 7 EVRM” counts as one reference in this evaluation. With those definitions and a sample size of 163 a precision of 0.993 was found with a recall of 0.870. The F -measurement of the reference finding is 0.925. The high precision was expected due to the strictness regular expression. A recall of 0.870 was also in the line of expectation. It was clear that this regular expression had some trouble. It was noticed that it often failed on recognising laws with long names like “Europees Verdrag tot bescherming van de rechten van de mens en de fundamentele vrijheden” (in English: “Convention for the Protection of Human Rights and Fundamental Freedoms”). This is due to the length of the sentence. If it was tried to capture this with the regular expression, the regular expression would match too easily, often matching entire sentences where it should have matched only the law.
2.3.2
Resolving references
For the resolving part (i.e. finding the right URL given a found reference), a precision of 0.952 was found with a sample size of 2504 . The recall that was found is 0.846, resulting in an F -measurement of 0.8959169. The recall was higher than expected. The expectations were lower because (as mentioned), the regular expression had some trouble matching 4 This sample size is higher for the simple reason that this evaluation was less difficult and thus less time-consuming (since it was not necessary to read entire court opinions)
2.3. EVALUATION
17
laws with long names. That the recall was higher than expected can be explained by the performance of the inheritance logic.
18
CHAPTER 2. FINDING AND RESOLVING REFERENCES
Chapter 3
Method This chapter focuses on the methodology that was implemented to find structure in the references from (immigration) case-law to legislation. The form and properties of the available data will be discussed in section 3.1. Subsequently, section 3.2 will discuss how these raw data can be utilised as input to a clustering algorithm. In machine-learning, this is called feature selection. The next section 3.3 will describe the clustering algorithm that was used to acquire the results. Finally, section 3.4 briefly describes the χ2 method that was used to see what features were the most destinguishing for the found clusters.
3.1
Available data
The data that was used for the exploratory research consists of XML files that contain court opinions, the references and some meta-data.1 The interesting meta-data is discussed in section 2.1. In total there are 13.311 these documents, with over 100.000 found and resolved references.
3.2 3.2.1
Feature selection Related work
As mentioned in the introduction, no literature was found on the subject of references from case law to legislation except van Opijnen (2014, chap. 7). For determining the relevance of case law, he uses certain “variabeles” of references from case law to legislation. These “variabeles” will be called “featues” in this thesis, to be consequent with other literature. The features van Opijnen distinguishes are the following: The reason for referring Van Opijnen distinguishes two types of references: firstly, a procedural one and secondly, a substantive one. Since van Opijnen is only inter1
An example XML file can be found in appendix C.
19
20
CHAPTER 3. METHOD ested in substantive references, the author has decided to remove all references to a specific article that appears more often than 2000, 1000 or 500 times.
The multiplicity of the reference This is how often one court opinion cites one article. The temporal validity aspect Whether the referred article is current or lapsed. The hierarchical position of the referred law Some laws are higher in hierarchy than others. E.g. European law is higher in hierarchy than local government legislation. The document structure level of the reference The difference between a reference to a book compared to the reference to an explicit article or subsection. Here will be discussed whether the features van Opijnen (2014) identifies also apply to this research keeping in mind that this research does not directly focus on the relevance of case law, but more specifically on the references and whether these can be categorised. The first attribute, the reason of referring, was handled in an arguably simple way. Although the identification of these two types might be interesting for this research, the way in which van Opijnen filters all references to articles that are being cited more often than some number is argued to be too rigorous for this research. This research focuses on finding natural structures in the data, working towards a categorisation in type of - or reason for - referring or maybe in the role a reference plays in a court opinion. Categorising the reason of referring in such a simple way is too simple for this approach.2 The multiplicity of the reference could be an interesting feature for this research. Van Opijnen states that the more often a judge refers to a certain article, the more likely it is that that article is important. It could also be that the references occurring very little are of one type. The temporal validity aspect is presumably not interesting in this setting. It would be a safe assumption that judges mostly refer to current legislation, since lapsed legislation is not valid. The situation that older cases with older legislation is being discussed is outside the scope of this research. This could be seen as a (rare) different category. The hierarchical position of the referred law could be a useful feature in this setting. When trying to find reasons for referring or roles that references play in a court opinion, it is could be that these categories exist within each level of hierarchy. However it could also be possible that references with a certain role never refer to e.g. European law. The document structure level of the reference is presumably an interesting feature. A lower document structure level suggests a more specific reference, which could indicate a different role.3 This concludes the discussion of the features used by van Opijnen (2014, chap. 7). Section 3.2.2 will discuss other features that have been considered and which have been selected for actually usage. Subsequently, section 3.2.3 will discuss those features that were used in the clustering algorithm and will focus on how these were generated. 2
It is arguable that a label of relative frequency of the a reference in fact would be a good feature. This will be discussed in section 5.1 3 Notice that the arguments supporting whether a feature could benefit the analysis can be indefinite. This is due to the indistinctness of the analysis task. Since this thesis describes exploratory research to some sort of categorisation, solid arguments can be hard to provide.
ingevolge (als/in) bedoeld(e) (in/onder) op grond van met/om toepassing van/aan (in …) (is) bepaald(e) (in/met) strijd(ige) (oplevert/gehandeld) (met) in de zin van (zich/met) beroep(en) op krachtens juncto (/jo) … (is) niet van toepassing (is) (in … is) neergelegd (in) in … beschreven gelet op schending van volgt uit in … gemaakte onderscheid (als) vermeld (in/onder) in afwijking van in … voorgeschreven inzake in samenhang met … zoals blijkt uit (de tekst van) (niet) op de voet van … geldt binnen de reikwijdte van … vallen (in …) (is) opgenomen (in …) als genoemd in … op basis van voldoen aan bevat de bevoegdheid voorziet behoefte moet … getoetst worden luidt … : als vervat in toepasselijkheid van … getoetst (dient) te worden aan … op … gebaseerde te vinden in in … wordt gesteld het in … gegeven recht (om) in combinatie met niet op gespannen voet zal staan met … in … gegeven bevoegdheid aan … voldoet vloeit uit … voort kan … worden geïnterpreteerd als inachtnemen van … in het kader van … ex … gedefinieerd in .. in de …-procedure voorlagen overtreding van … bescherming van uitleg van … in … gebezigde begrijp … is (…) relevant
3.2. FEATURE SELECTION
2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
7 6 5 4 4 4
13 13 12
17
29 28
32
58 58
21
Figure 3.1: Frequencies of keywords manually found around all 229 references in 30 randomly selected court opinions.
3.2.2
Considered features
This section will discuss all the features that were considered during the research. It additionally explains why they were or were not selected for the analysis. It will start with one particular feature (e.i. the keyword) and enumerate the simpler ones thereafter.
Keywords In the process of trying to identify different categories, 30 randomly selected court opinions were examined to identify repeating patterns. Notable were reoccurring words or phrases around the reference (from now on referred to as keywords. The most frequent ones being “pursuant to” (in Dutch: “ingevolge”) and “alluded to” (in Dutch: “als bedoeld in”). In figure 3.1, a list of all the discovered (Dutch) keywords is presented with their occurrence in the sample.
22
CHAPTER 3. METHOD
It is hoped that these keywords are somehow typical for categories of references. To investigate whether judges were aware of why they used different keywords, the matter was presented to three persons who are experts on this field. Two of them are judges and have experience writing verdicts. The last one teaches students how to read court opinions. The two judges were very sceptical and did not think that the keywords would be good indicators for categories. In fact they did not think that such categories existed at all. The third expert was very interested and came up with a categorisation for the keywords using cards containing the keywords and a sample sentence on the back. She defined the categories as follows:
• The first category of keywords has a selecting nature, which identifies essential laws for a case. Keywords that belong to this category are: “applying ...” (“met toepassing aan”), “on the basis of ...” (“op grond van”), “as written in” (“als neergelegd in”) and “with a plea of ...” (“met beroep op”).
• A second category indicates a more applying reference. This category implies that the reference is subjected to interpretation and is more about the meaning or content of the reference. This category contained the following keywords: “persuant to ...” (“ingevolge”), “allude to ...” (“als bedoeld in”) “by virtue of ...” (“krachtens”) “on the subject of ...” (“inzake”) etc.
• The third category has a concluding (denying) function. It is an answer to the first category. Keywords belonging to this category are: “in breach/contravention of” (“met schending van”), “as a departure from” (“in afwijking van”), “in violation of” (“in strijd met”) and “... does not apply” (“is niet van toepassing”).
• The last category only consisted of the keyword “in conjunction with” (“juncto” or “in samenhang met”). This is an arguably unintersting category.
This is the categorisation of only one jurist, but it is interesting that she was both able to categorise the cards and give a clarification of the nature of those categories. This provides some background knowledge for the interpretation of the results. Now will be explained how the references have been labelled with a keyword. For all keywords occurring more than twice, a regular expression was constructed to automatically identify which keyword was present for a certain reference. These regular expressions are stored in a CSV file and are sorted in the same way as in figure 3.1 (i.e. the more frequent the keyword was, the higher its position). The label algorithm works as follows (algorithm 1):
3.2. FEATURE SELECTION
23
Algorithm 1: Pseudocode for labeling the reference with keywords Data: The reference string, the paragraph in which this string occurs and a list of regular expressions for the keywords Result: A reference label referenceNeighbourhood = 60 characters before and after the reference; for regular expressions do if regular expression matches referenceNeighbourhood then return the keywordlabel corresponding to the regular expression; end end return none; This algorithm will greedily choose the first matching regular expression, without considering alternatives. This could result in a skewer distribution than the resulted one presented in figure 3.2, but it is a relatively simple way to label the references. To counteract the unrealistic skewness the first regular expressions have been made more complex to avoid false positives as can be seen in appendix A. In total 152, 853 references were found in the domain of immigration law. Of those, 120, 855 (79%) were assigned a keyword. The keywords could say something about different categories occurring naturally in the data and will therefore be used in the eventual analysis.
Other features Other features that have been considered during the research process are the following: Bag of Words In text analysis, it is common to use transform the text into a vector with word occurrences. Although this can be a very powerful tool, it will also increase the complexity since every word will translate into a additional dimension. This high dimensionality is unsuited for some algorithms. In our data, a bag of words could be constructed from the paragraphs in which the reference occurs. Note that one paragraph could (and frequently does) contain several references. These will all have the same word vector. Position of reference in verdict This is a rather self-explanatory feature. The idea behind this feature is that different sections of a verdict play different roles. Though these sections are not distinguished (in a computer readable way), their order remains more or less constant. Taking the relative position of the reference these patterns should roughly estimate in which section a reference is. In figure 3.3, the distribution of the variable is plotted. It is noticeable that the reference density is higher in the first section of a court opinion. Position of the referred article in law This feature is much like the feature discussed above, only this concerns the position of the article in the law. The idea is the same, just like court opinions, laws are constructed in a structural matter. Therefore this feature could indicate a certain role the article plays in the law and thus indicate a certain role of the reference in the court opinion. An example of this is that definitions often appear at the opening of a law.
24
CHAPTER 3. METHOD
28609
ingevolge
21564
(als/in) bedoeld(e) (in/onder)
17574 15574
op grond van met/om toepassing van/aan (in …) (is) bepaald(e) (in/met) strijd(ige) (oplevert/gehandeld) (met) in de zin van … (is) niet van toepassing (is) schending van (zich/met) beroep(en) op gelet op krachtens (in … is) neergelegd (in) in samenhang met … voldoen aan (niet) op de voet van juncto (/jo) bevat de bevoegdheid moet … getoetst worden (in …) (is) opgenomen (in …) in afwijking van als genoemd in … luidt … : inzake op basis van volgt uit blijkens in … beschreven (als) vermeld (in/onder) in … voorgeschreven gedefinieerd in .. zoals blijkt uit (de tekst van)
7893 5793 3607 2818 2607 1837 1765 1580 1526 1343 1032 912 783 701 575 479 377 376 334 289 187 186 139 135 99 87 61 10 0
7500
15000
22500
30000
Figure 3.2: Frequencies of keywords automatically found around all found references to legislation in the field of immigration law.
3.2. FEATURE SELECTION
25
Date The date of verdict is a self-explanatory feature again. It could be that something in the reference-structure changed over the years. It is expected that this feature will not contribute much to the sort of categorisation that is sought for in this research, but it could expose some structures in the data. Type of court In the Dutch legislative mechanism, there exist a number of different courts. Just like the previous one, this feature may also contribute by exposing structures in the data. Law-identification number In Dutch legislation, every law has an identification number. This is evidently important, because it provides the law a reference refers at. In the data, there were 138 unique Law-identification numbers. Reference URL This could be seen as the actual reference. The string “artikel 28 van de Vreemdelingenwet 2000 (hierna Vw 2000)” translates into http://doc.metalex .eu:8080/page/id/BWBR0011823/artikel/28, which contains the original law. The parser works on different document structure levels so http://doc .metalex.eu:8080/page/id/BWBR0011823/artikel/1b is different from http://doc.metalex.eu:8080/page/id/BWBR0011823/artikel/1. This could also be an important feature, when the categories of references cannot be solely described by the law-identification number, but will differ on an article level. Considering the assumption mentioned above,that different sections in a law play different roles, this could very well be an important feature. Field of law In the available data, the court opinions are labelled with one or more fields of law. This could be an interesting feature when multiple fields of law exist in the domain. However, this research limits to the field of immigration. Though some court opinions carried more labels (e.g. administrative and criminal law), these labels appeared to be assigned slightly arbitrary. Moreover this information is already existing in the form of the law-identification number. Therefore, it has been decided not to include this feature in the analysis.
3.2.3
Applied features
This will be the last section about feature-selection. It will describe which features were selected for analysis and how they were implemented. The feature building was implemented in Python4 except for the bag of words implementation some cleanups. Just as the clustering, the latter ones were performed by in Weka5 The following features have been used: Position of the reference in the document (pos ref) This feature was constructed using the document of the court opinion, the paragraph around the reference and the 4
Python Software Foundation. Python Language Reference, version 2.7. Available at http:// www.python.org 5 Weka 3.6.11, a suite of machine learning software written at the University of Waikato. See: Hall et al. (2009). Available at http://www.cs.waikato.ac.nz/ml/weka/
26
CHAPTER 3. METHOD
Figure 3.3: Distribution of the position of the reference in the court opinion. A sample of 100,000 references was used for this plot. reference string by first calculating the relative position of the paragraph in the document and subsequently calculate the relative position of the string in the paragraph. Combining those will yield the relative (or normalised) position. Law-identification number (ref bwb) This feature is directly copied from the original data xml data. It was used in weka as a nominal attribute. Type of court (uitspraak hof) This feature existed in two forms. One is the normal form, in which this feature is copied from the original xml file and converted to a nominal attribute. The other is a simplified one that only distinguished courts from higher courts or supreme courts. This simpler version was implemented because some local courts were very rare, in fact almost all were from The Hague. The additional courts did not contribute much to the analysis as can be seen in section 4.1. Year (uitspraak year) This simple feature was extracted from the original data. It was converted to “years ago” by subtracting the current year from it. This makes it orderly since the numbers are smaller. Reference url (ref url) This feature was simply copied from the original data and converted to a nominal attribute in weka. Keyword For selecting the keyword, the exact same algorithm (1) as mentioned in the previous section has been used. Again, this attribute was converted to a nominal attribute. Two rounds of analysis have been performed. The first round was more experimental, the second round was used for actual interpretation. In the first round all features mentioned here were used except the reference url. This feature was originally left out to drastically decrease the dimensionality of the input data. This made the results much easier to interpret. In the second round, the simpler version of the uitspraak hof feature was used.
3.3. CLUSTERING
27
Additionally all instances with keyword “None” were replaced by instances with a missing value as keyword. This should have been done earlier, but unfortunately was discovered after the first analysis.
3.3
Clustering
The clustering was performed in Weka 3.6.11. The expectation maximisation (EM) algorithm was chosen to perform the clustering. This algorithm maximises the log likelihood of the data given the model by fitting Gaussian distributions on numeric attributes. It uses discrete estimators for nominal values. Since it is not clear how many clusters we are looking for, multiple numbers of clusters were tried and compared by cross validation. Higher numbers of clusters were penalising, since the log likelihood is guaranteed to increase in this algorithm when the number of clusters increases. This was performed using Weka by setting the numCluster variable to −1. The expectation maximisation algorithm has the advantage that the algorithm itself is comprehensible and its results are also easy to interpret. For the numeric attributes it will return the µ and σ of best fitted Gaussian per cluster. For nominal values it will return frequencies that are Laplace corrected (i.e. added one to make dividing by zero impossible). Every instance is part of every cluster with some likelihood. While these instances will eventually be assigned to the cluster with the highest likelihood for that instance, their values will be proportionally distributed according to the different likelihoods. Therefore all frequencies in the results will be greater than 1.
3.4
χ2 Attribute Evaluation
To evaluate how much attributes contributed to the formation of the clusters, the ChiSquaredAttributeEval of Weka 3.6.11 was used. In this method, the χ2 -statistic is calculated for every feature. The result is a rank list of the features, telling how much they contribute to distinguishing the classes.
28
CHAPTER 3. METHOD
Chapter 4
Results This section will discuss the results of the clustering algorithm. Since clustering results are subject to interpretation, this section will not only contain technical results, but instead will also focus on interpretation. As made clear in the introduction, this research focuses on finding natural categories that exist in the data. The present writer has little domain specific knowledge. Interpretation will therefore be done in a data-driven manner. The juristic interpretation is left for the skilled reader and further research.
4.1 4.1.1
Test run Data
In the first round of clustering, 4000 randomly selected instances (references) were used with the following features1 : • pos ref • ref bwb • uitspraak hof • uitspraak year • keyword The feature ref url was omitted due to experience in earlier experiments. In those, the feature extraction was done differently, making a Boolean vector of all ref url’s. This means that instead of one nominal attribute, there were 545 Boolean attributes. This resulted in high dimensional sparse data, which made the interpretations of the results that the expectation maximisation algorithm produces difficult. In the second round, this feature was added to the analysis. These results can be found in section 4.2. 1
For a translation of these abbreviations, see section 3.2.3 on page 25.
29
30
CHAPTER 4. RESULTS Average merit
Attribute
7631.957 +-50.811 2643.852 +-19.53 2572.077 +-19.586 2300.943 +-15.295 1460.295 +-46.341 0 +- 0
pos ref ref bwb uitspraak year uitspraak hof ref keyword Instance number
Table 4.1: χ2 results of the first round (EM clustering with 4000 instances).
proportion mean std. dev.
Cluster0 0,14
Cluster1 0,33
Cluster2 0,15
Cluster3 0,06
Cluster4 0,08
Cluster5 0,25
0,3755 0,282
0,4974 0,2062
0,1357 0,0766
0,5371 0,2473
0,9053 0,0702
0,0191 0,0147
Table 4.2: Parameters of distribution for pos ref per cluster as found by the EM-algorithm in the test round.
4.1.2
χ2 Evaluation
In table 4.1 it can be seen that feature pos ref merits much more than the other features. This appears to be a good result, since (as mentioned in section 3.2.2) court opinions are generally well-structured and this therefore leaves the possibility open that certain laws appear at certain locations in a verdict. Moreover, it was not expected that uitspraak year and uitspraak court merit more than ref keyword. Though it would be possible that styles changed over the years, it was expected that it would not significantly contribute. This is the same for the uitspraak hof, especially since we did not use the simpler version (as described in section 3.2.3).
4.1.3
Interpretation
The expectation maximisation algorithm distinguished six different clusters. The raw results can be viewed in B.1 on page 50. A full summarising of the results is not presented here, because it is irrelevant for this thesis. An example of how such an summarising would look like is presented here: Cluster0 This cluster contains 565 ( 14%) instances. This cluster contains mostly recent court opinions, since the paramaters of the best fitting distribution of uitspraak year
4.2. FINAL RUN
31
are µ = −1, 14; σ = 0, 8. Furthermore it contained more than 90% of the occurences of the BWB-number belonging to “Verdrag betreffende de werking van de Europese Unie” (in English: “Treaty on the functioning of the European Union”). Cluster1 ... Such a summarising will be made for the eventual clustering, but is left out here because of the results of the χ2 attribute evaluation. These indicate that the results gathered here are not the sort of results sought after in this research. The next section will discuss the results of the second round in more detail.
4.2 4.2.1
Final run Data
After experimenting with different variables and different sample sizes, it was decided to use a sample size of 8000 for the final experiments. One run took approximately ten minutes to complete. This made it possible to experiment with slight changes without it being too time consuming. This time, the following features were used for the analysis: • pos ref • ref bwb • uitspraak hof (simplified, as explained in section 3.2.3) • uitspraak year • keyword • ref url Compared to the previous analysis, only two changes have been made. First of all, uitspraak hof was replaced for the simpler version. Additionally ref url was added in this analysis. Since the EM-algorithm is not guaranteed to find a global optimum, the experiment was conducted 5 times with different seeds (for the random initialisation component) and the results with the highest log-likelihood were used for interpretation. This still does not guarantee a global optimum. This is however a limitation of the algorithm that cannot be totally avoided.
4.2.2 χ2 Evaluation In the results of the χ2 evaluation, which are presented in table 4.3, it can be seen that ref url is the most distinguishing attribute, followed by ref bwb, ref keyword and pos ref. The features uitspraak hof and uitspraak year are far less distinguishing. This ranking is compatible with the ideas discussed in section 3.2.2. The high distinguishing capacity of ref url could indeed contribute to the idea that different parts of a
32
CHAPTER 4. RESULTS Average merit
attribute
56718.739 +-289.697 28381.434 +-187.74 12785.56 +-46 9714.102 +-89.068 1477.885 +-24.557 1364.928 +-63.517 0 +- 0
ref url ref bwb ref keyword pos ref uitspraak hof uitspraak year Instance number
Table 4.3: χ2 results of the final round (EM clustering with 8000 instances).
law are referred to for different reasons. ref bwb also has a high distinguishing capacity. This is not directly in line with that idea, since this would suggest that clusters distinguish also on entire laws (instead of parts of that law). They do however remain compatible, for it could either be that the clusters distinguish on a combination of the law and the more specific article, or that some clusters distinguish on laws and others distinguish on articles. The feature ref keyword merits relatively more compared to the test round. It would be interesting to compare the clusters with the categorisation made by our third expert as mentioned in section 3.2.2. This comparison will be made below. It should be clear that these interpretations are cautiously made. It could be that one feature merits a great deal by distinguishing only one cluster, but doing this very well. The χ2 -evaluation is however useful to get a first impression of the distinguishing capacity of the features.
4.2.3
Interpretation
The EM-algorithm distinguished 12 different clusters in the final round. For this round, the raw results were more difficult to interpret for two reasons. The first reason is that there were 12 different clusters (compared to 6 with the test round). The second reason is that there were about eight times as much attributes because the feature ref url was added2 . To be able to interpret the results, the following steps were taken: • All different options of the attributes ref url and ref bwb that occurred less than 12 times, were ignored. • All frequency values (i.e. the results of all feature except from pos ref and uitspraak year) were proportioned by dividing them by the relative frequency of the class and the total frequency of the attribute. 2 Another reason of the drastic increase of attributes was the doubling of the instances. This added many new options for the ref bwb as well.
4.2. FINAL RUN
33
Figure 4.1: Frequency distribution of the clusters found with the EM-algorithm with 8000 instances. proportion mean std. dev.
Cluster0
Cluster1
Cluster2
Cluster3
Cluster4
Cluster5
Cluster6
Cluster7
Cluster8
Cluster9
Cluster10
Cluster11
0,02 0,5861 0,4626
0,11 0,0164 0,0126
0,11 0,3759 0,2585
0,3 0,4221 0,2768
0,08 0,0207 0,0159
0,07 0,8693 0,0764
0,09 0,3426 0,2157
0,1 0,325 0,3294
0,01 0,5538 0,2031
0,01 0,4963 0,2812
0,06 0,291 0,2249
0,04 0,3144 0,2677
Table 4.4: Parameters of distribution for pos ref per cluster as found by the EM-algorithm in the test round.
The results of these steps are presented in appendix B.2 on page 52 and will be summarised here. The interpretation will be done per cluster in order of cluster size. A plot of the frequency distribution of the clusters can be seen in figure 4.1.
Cluster3 This cluster contains 30% of the instances. The following keywords occur relatively often: • “op grond van” (“on the basis of”) • “is bepaald” (“is laid down”) • “ingevolge” (“pursuant to”) • “op basis van” (“on the basis of ...”) • “krachtens” (“by virtue of ...”) • “op de voet van” (“on footing of ...”)
34
CHAPTER 4. RESULTS • “als genoemd in” (“as mentioned in”) • “bevat de bevoegdheid” (“... holds the competence”) • “voldoen aan” (“complies with”) • “luidt” (“reads”) • “volgt uit” (“follows from”)
The keyword “is niet van toepassing” (“does not apply”) occurs almost never in this cluster. Most references (97%) if this cluster refer to the immigration law (“Vreemdelingenwet 2000”). 66% of all references to the immigration law are is this cluster. There are no references to administrative law in this cluster.
Cluster1 This cluster contains 11% of the instances. The following keywords occur relatively often: • “op grond van” (“on the basis of”) • “gelet op” (“attended to”) • “ingevolge” (“pursuant to”) • “op basis van” (“on the basis of ...”) • “in afwijking van” (“as a departure from”) • “in samenhang met” (“in conjunction with...”) • “neergelegd in” (“as written in”) • “inzake” (“on the subject of ...”) • “juncto” (“in conjunction with...”) • “als genoemd in” (“as mentioned in”) • “met toepassing van” (“applying ...”) It contains relatively many references to a law concerning relief of asylum seekers (“Wet Centraal Orgaan opvang asielzoekers”), but also many to immigration law. Among the specific references are: • Article 34a of the Immigration Law that describes the circumstances in which an application for a residence permit can be denied. • Article 33a of the Immigration Law that describes the competence of the Minister concerning residence permits. • Article 70 of the Immigration Law that mentions what articles apply to an appeal based on some other articles. The references in this cluster appear at the beginning of a court opinions. As an be seen in 4.4, since both the mean and standard deviation are very small.
4.2. FINAL RUN
35
Cluster2 This cluster contains 11% of the instances. The following keywords occur relatively often: • “als bedoeld in” (“alluded to”) • “is bepaald” (“is laid down”) • “moet getoetst worden” (“needs to be tested against”) • “ingevolge” (“pursuant to”) • “op basis van” (“on the basis of ...”) • “neergelegd in” (“as written in”) • “juncto” (“in conjunction with...”) • “voldoen aan” (“complies with”) • “luidt” (“reads”) • “is opgenomen in” (“is included in”) The keywords “is niet van toepassing” (“does not apply”) and “als genoemd” (“on the subject of ...”) occur almost never in this cluster.
Cluster7 This cluster contains 10% of the instances. The following keywords occur relatively very often: • “moet getoetst worden” (“needs to be tested against”) • “schending van” (“in violation of”) • “met beroep op” (“with a plea of ...”) The following keywords occur relatively less than the ones above, but still relatively often: • “in strijd met” (“in violation of”) • “in de zin van” (“as meant in”) • “gelet op” (“attended to”) • “inzake” (“on the subject of ...”) • “voldoen aan” (“complies with”) The following keywords occur almost never in this cluster: • “is niet van toepassing” (“does not apply”) • “in afwijking van” (“as a departure from”) • “als genoemd” (“on the subject of ...”) This cluster contains all references to the Convention for the Protection of Human Rights and Fundamental Freedoms. The most of these references are to article 3 of which concerns the prohibition of torture, article 8 (Right to respect for private and family life) and article 5 (Right to liberty and security).
36
CHAPTER 4. RESULTS
Cluster4 This cluster contains 8% of the instances. The following keywords occur relatively often: • “met toepassing van” (“applying ...”) • “juncto” (“in conjunction with...”) The following keywords occur almost never in this cluster: • “moet getoetst worden” (“needs to be tested against”) • “als genoemd in” (“as mentioned in”) This cluster contains references to multiple laws. A noticeable article that occurs often in this cluster is article 8:54 in the administrative law that enumerates the reasons why a judge in administrative law can end an investigation. Another notable article is article 8:75, which states when a judge is competent of sentence a party with the legal costs of the other party. There are more noticeable articles from chapter 8, that all cover procedural rules. This could indicate that the keyword “met toepassing van” (“applying ...”) is often used for procedural references. Just as cluster 1, most references in this cluster occur at the beginning of the court opinion.
Cluster5 This cluster contains 8% of the instances. The following keywords occur relatively often: • “in strijd met” (“in violation of”) • “is bepaald” (“is laid down”) • “met toepassing van” (“applying ...”) • “is opgenomen in” (“is included in”) The following keywords occur almost never in this cluster: • “moet getoetst worden” (“needs to be tested against”) • “inzake” (“on the subject of ...”) This cluster contains references to several laws. A noticeable one is the Convention on the Rights of the Child. An example of an article that is being referred to a lot from this cluster is article 3, which contains regulation like “In all actions concerning children, whether undertaken by public or private social welfare institutions, courts of law, administrative authorities or legislative bodies, the best interests of the child shall be a primary consideration.”. Another noticeable article that is being referred to often is article 8:74 from the administrative law. This law dictates that when a plea is declared legitimate, the payed court fee should be refunded. Another reoccurring article is article 3:2 from the administrative law, that dictates that the governing body should gather the necessary knowledge about the relevant facts and the interests to be weighted. In contrary to clusters 5 and 1, most references in this cluster occur at the end.
4.2. FINAL RUN
37
Cluster10 This cluster contains 6% of the instances. The keyword “als bedoeld in” (“alluded to”) occurs relatively very often. The keyword “in afwijking van” (“as a departure from”) also occurs often, but less than “als bedoeld in”. It contains references to several laws. Noticeable references on article level are articles 14, 28 and 33 of the Immigration Law, which all state competences of the Minister regarding different subjects. Another is article 8 of the same law, which contains the a definition of a (by an enumeration of conditions for) lawful residence.
Cluster11 This cluster contains 4% of the instances. The following keywords occur relatively often: • “in de zin van” (“as meant in”) • “ingevolge” (“pursuant to”) • “in afwijking van” (“as a departure from”) • “in samenhang met” (“in conjunction with...”) • “neergelegd in” (“as written in”) • “bevat de bevoegdheid” (“... holds the competence”) • “luidt” (“reads”) The following keywords occur almost never in this cluster: • “is niet van toepassing” (“does not apply”) • “inzake” (“on the subject of ...”) • “als genoemd in” (“as mentioned in”) Due to the small cluster size, this cluster is hard to interpret. It contains references to several laws and several articles within those laws. No clear patterns were found. The standard deviation of the uitspraak year is the lowest of all cluster (with 0.82 year). For some reason, recent references from recent court opinions seem to have gathered in this cluster, with a mean uitspraak year of 1.7 years ago.
Cluster0 This very small cluster contains 4% of the instances. The keyword “is niet van toepassing” (“does not apply”) is the only keyword in this cluster. This always occurs with the reference to article 6 of the Administrative Law (“Algemene wet bestuursrecht”) which mentions whether an appeal can be declared inadmissible. Cluster9 This very small cluster contains the keywords “gelet op” (“attended to”) and “voldoen aan” (“complies with”). It mostly refers to immigration law. Due to the small size of this cluster, no other results are worth mentioning.
38
CHAPTER 4. RESULTS
Cluster8 This very small cluster appears to contain references from an average 13.3 years ago (with a standard deviation of 1.05 years). It has not other distinctive features due to its small size except from that it is the only cluster with a reference to criminal code (Wetboek van strafrecht). Interesting to consider is that 14 years ago, a new immigration law was adopted. Maybe that considerably changed something in the data as a result of which the EM algorithm needed to assign a seperate cluster to it.
Chapter 5
Conclusion 5.1
Findings
This thesis describes exploratory research into the possibility to categorise of references from case law to legislation by investigating whether natural patterns exist in the properties (i.e. features) that describe those references. It was found that this is indeed possible. This was showed by employing an unsupervised clustering algorithm (Expectation Maximisation) on data that consisted of features that were extracted out of Dutch case law. The algorithm distinguishes 12 clusters. The four most discriminating features are (in order): (1) the URL of the specific article the reference referred to, (2) the law (without further specification), (3) the keyword that was identified and (4) the position of the reference within the court opinion (see section 3.2 for an extensive discussion of the features that were used). Some notable clusters are: • Cluster 3, the largest cluster, which mainly contains references to the Dutch immigration law (“Vreemdelingenwet 2000”) and no references to administrative law. Among others, it contains many “op grond van” (“on the basis of”), “is bepaald” (“is laid down”), “ingevolge” (“pursuant to”) and “op basis van” (“on the basis of ...”). • Cluster 7 which contains all references to the Convention for the Protection of Human Rights and Fundamental Freedoms. It contains many “moet getoetst worden” (“needs to be tested against”), “schending van” (“in violation of”) and “met beroep op” (“with a plea of...”). • Cluster 4 which contains many “met toepassing van” (“applying”) and seems to refer to procedural rules mostly. The findings that show that, at least from a data driven point of view, multiple categories indeed can be distinguished.
39
40
CHAPTER 5. CONCLUSION
5.2
Discussion
It is important to keep in mind that it is not necessary that a data driven categorisation also is natural to or acceptable for jurists. The question of whether this categorisation it useful to jurists remains open. Moreover, as can be seen by comparing the results for the test round with the result for the final round, the clustering depends highly on the features that were chosen. It could be interesting to try different combinations of features. It should also be noted that some features that were discussed in section 3.2, have not been used in the implementation due to the limited time available for this project. These features are: • The relative frequency of the reference in a court opinion (what van Opijnen (2014) called “multiplicity”). • The hierachical position of the law (see section 3.2) • Document structure level (idem) • Bag of Words, which was actually implemented, but left out due to increased complexity (which resulted in much longer execution time1 and results that were more difficult to interpret). This bag of words could be interesting on both sides of the reference. This could be performed on both the paragraph of the court opinion in which the reference occurs, but also on the section of the law the reference refers to. • The position of the specific section referred to in the law. These were all interesting candidates that have been omitted only due to the limited time available. There is one other point of critique on the analysis that was performed. It could be useful to remove instances that contain values for attributes that occur less often than a certain value. In the followed approach these values were only hidden after the clustering, for interpretation. It is arguably better to remove those beforehand, since the results of the clustering algorithm will be cleaner and easier to interpret. These rare cases will likely only provide noise to the clusters. Additionally it could be interesting to look at the performance of alternative cluster algorithms. Expectation Maximisation was chosen for its comprehensibility, but other algorithms may lead to different results.
5.3
Suggestions for further research
The points of discussion mentioned above lay the ground for further research. Apart from the methodological adjustments that were mentioned above, it would be interesting to study the legal relevance of this approach. There would be multiple possibilities to do so. A direct way would be to simply show the results to a jurist and use this research as a basis to distinguish categories that are natural to jurists, perhaps by splitting or combining some of the found clusters. 1
The execution time of the EM algorithm with the bag of words was slightly over 24 hours
5.3. SUGGESTIONS FOR FURTHER RESEARCH
41
Another possibility could be to add this cluster label as a dimension for network analysis on the citation network. Building on the work of Fowler et al. (2007) and Winkels et al. (2011) (as mentioned in the introduction), one could try to see whether using the reference for one cluster only for such analysis would generate different results for the ability of those methods to identify relevant sources.
42
CHAPTER 5. CONCLUSION
References Fowler, J. H., Johnson, T. R., Spriggs, J. F., Jeon, S., & Wahlbeck, P. J. (2007). Network analysis and the law: Measuring the legal importance of precedents at the us supreme court. Political Analysis, 15(3), 324–346. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The weka data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 10–18. Jackson, P., Al-Kofahi, K., Tyrrell, A., & Vachher, A. (2003). Information extraction from case law and retrieval of prior cases. Artificial Intelligence, 150(1), 239–290. Mazzega, P., Bourcier, D., & Boulet, R. (2009). The network of french legal codes. In Proceedings of the 12th international conference on artificial intelligence and law (pp. 236–237). van Opijnen, M. (2014, 2). Op en in het web. In (p. 444-446). Paleisstraat 9 2514 JA Den Haag: Boom Juridische uitgevers. Winkels, R., Boer, A., & Plantevin, I. (2013). Creating context networks in dutch legislation. Available at SSRN. Winkels, R., Ruyter, J., & Kroese, H. (2011). Determining authority of dutch case law. Legal Knowledge and Information Systems, 235, 103–112.
43
44
REFERENCES
Appendices
45
Appendix A
Keyword Regular Expressions
47
48
APPENDIX A. KEYWORD REGULAR EXPRESSIONS
Keyword
Regular Expression
ingevolge
[Ii]ngevolge[ˆa-zAZ][Aa]rtikel—[Aa]rt\\.? ([09][0 − 9a − zA − Z : .]*)—[Bb]oek ([0-9][0 − 9a − zA − Z : .]*)—[Hh]oofdstuk [09][0 − 9a − zA − Z : .]* bedoelde? [Oo]p grond van[ˆa-zAZ][Aa]rtikel—[Aa]rt\\.? ([09][0 − 9a − zA − Z : .]*)—[Bb]oek ([0-9][0 − 9a − zA − Z : .]*)—[Hh]oofdstuk [09][0 − 9a − zA − Z : .]* (toepassing( van— aan)?) (is )?bepaalde? ((in )?strijd(ige)?.+met) in de zin van beroep(en)? op [Kk]rachtens ([Jj]uncto— [Jj]o — [Jj]o— ˙ [Jj] ) niet van toepassing ( neergelegde? ) beschreven gelet op schending volgt uit gemaakte onderscheid (als)? vermeld (in—onder) in afwijking van voorgeschreven inzake in samenhang met zoals blijkt uit op de voet van
(als/in) bedoeld(e) (in/onder) op grond van
met/om toepassing van/aan (in . . . ) (is) bepaald(e) (in/met) strijd(ige) (oplevert/gehandeld) in de zin van (zich/met) beroep(en) op krachtens juncto (/jo) . . . (is) niet van toepassing (is) (in . . . is) neergelegd (in) in . . . beschreven gelet op schending van volgt uit in . . . gemaakte onderscheid (als) vermeld (in/onder) in afwijking van in . . . voorgeschreven inzake in samenhang met . . . zoals blijkt uit (de tekst van) (niet) op de voet van
Table A.1: Keywords with their regular expressions as used in algorithm 1 on page 23.
Appendix B
Raw results B.1
Test round
Attribute Relative frequency pos ref mean std. dev. uitspraak year mean std. dev. ref keyword ingevolge (als/in) bedoeld(e) (in/onder) (zich/met) beroep(en) op op grond van xe2x80xa6 geldt None schending van met/om toepassing van/aan krachtens (in xe2x80xa6) (is) bepaald(e) (in/met) strijd(ige) (oplevert/gehandeld) (met) gelet op (in xe2x80xa6 is) neergelegd (in) xe2x80xa6 (is) niet van toepassing (is) in samenhang met xe2x80xa6 volgt uit in de zin van (niet) op de voet van juncto (/jo) inzake (als) vermeld (in/onder) in xe2x80xa6 beschreven in afwijking van in xe2x80xa6 voorgeschreven [total] uitspraak hof Raad van State Rechtbank ’s-Gravenhage Rechtbank Haarlem Rechtbank Den Haag Rechtbank Maastricht Rechtbank Limburg Gerechtshof ’s-Gravenhage Rechtbank Midden-Nederland
Cluster0
Cluster1
Cluster2
Cluster3
Cluster4
Cluster5
0,14
0,33
0,15
0,06
0,08
0,25
0,3755 0,282
0,4974 0,2062
0,1357 0,0766
0,5371 0,2473
0,9053 0,0702
0,0191 0,0147
-1,1412 0,8092
-7,8355 3,7749
-7,4231 3,823
-6,37 3,6597
-8,0363 3,9888
-6,7169 3,7172
89,0693 67,5547 11,404 40,6032 150,3058 1,7992 13,9371 62,5492 21,4209 15,0782 34,6131 13,0522 6,2539 1,1116 14,0294 2,0306 10,7994 1,8796 1,918 1,0104 2,6262 1,028 4,4967 1,9158 570,4867
267,5342 249,8747 6,9201 171,7409 290,3798 10,1803 2,2618 103,9523 12,0038 94,9045 9,5008 18,3914 15,661 1,9867 5,7582 2,4436 54,1728 8,1714 6,3221 1,18 2,9217 1,1685 6,2668 1,0042 1344,7014
187,7267 100,2274 1,7582 150,2773 62,4397 6,0478 1,0392 31,6289 7,3618 24,6923 3,6067 4,9101 2,8644 2,3966 3,6413 2,4822 7,9651 9,5267 1,5798 3,4906 1,4426 1,0139 1,1911 1 620,3105
1,5785 5,9775 20,6208 2,0361 121,5841 1,3344 29,1376 2,6784 2,0465 9,7344 41,9386 3,6347 1,3948 1,0258 2,7507 1,0722 4,8548 1,1358 3,6901 1,9876 1,0095 2,0236 1,0452 1,0017 265,2936
59,9623 17,1142 3,393 21,8895 33,8845 2,9797 4,5491 55,4965 2,1654 39,5766 21,6812 6,3594 3,3727 40,9616 2,7549 1,0008 7,4727 4,335 1,1603 1,0007 1 1,8039 1,0001 1,0783 335,9923
145,1289 117,2514 12,904 77,4529 283,406 9,6586 16,0753 150,6947 5,0016 42,0139 36,6596 8,6522 11,4533 24,5176 15,0655 2,9706 24,7351 2,9515 10,3296 5,3307 1 1,9621 1 1 1007,2154
355,383 2,3956 1,2781 168,2849 1 7,0453 1,0026 2,9036
191,4574 1122,8542 1,5467 1,0594 3,1235 1,0695 1,3484 1,0882
48,4807 531,4983 2,9625 1,1705 3,8871 1,2558 1,2598 1
29,52 202,9532 1,2118 1,3938 1,0532 1,2403 1,4526 1,0067
8,9509 296,7334 1 1,2407 1 1,047 1 1,0015
153,208 812,5653 1,0008 4,8507 2,9362 1,3421 3,9364 1
49
50
APPENDIX B. RAW RESULTS Rechtbank Groningen Rechtbank Amsterdam Rechtbank Zwolle Rechtbank Rotterdam Rechtbank Leeuwarden Rechtbank Oost-Brabant Gerechtshof ’s-Hertogenbosch Rechtbank Alkmaar Rechtbank Assen Rechtbank Arnhem Centrale Raad van Beroep Rechtbank Roermond Rechtbank Zwolle-Lelystad Rechtbank Zutphen Rechtbank Zeeland-West-Brabant Rechtbank ’s-Hertogenbosch Rechtbank Middelburg [total] ref bwb BWBR0002757 BWBR0011825 BWBR0011823 BWBV0001000 BWBR0005537 BWBV0001002 BWBR0002629 BWBV0002508 BWBR0022704 BWBR0001903 BWBR0002368 BWBV0004701 BWBR0017959 BWBR0001854 BWBV0001506 BWBR0010424 BWBV0005225 BWBR0003391 BWBR0006685 BWBR0011695 BWBR0003738 BWBR0007149 BWBR0002154 BWBR0001827 BWBR0005290 BWBR0012002 BWBR0009709 BWBR0017646 BWBR0001840 BWBR0002320 BWBR0010346 BWBR0021505 BWBR0002448 BWBR0009726 BWBV0001021 BWBR0013409 BWBR0007333
1 4,303 1 3,9491 1 3,8605 1,2222 1 1 1,0001 1 1 1,8492 1 5,9393 1 1,0703 571,4867
1,8129 1,6385 1,0465 2,9738 1,9941 1,0108 1,0219 1,0882 1,0066 1,0722 1,1088 1,0892 1,0037 1,0954 1,0668 1,1225 1,0021 1345,7014
1,019 6,9272 1,0001 1,0824 1,0029 1,1879 1,6753 3,909 1,0281 2,9703 1,0002 1,8849 1,0731 1,0048 1,0261 1 1,0046 621,3105
1,329 6,941 1,9534 1,0757 1,0031 1,0235 1,0806 1,0028 1,0002 1,0243 1,891 1,0259 1,074 2,0045 1,0273 1,0039 1,0016 266,2936
1,8392 4,1343 1 1,0166 1 1,0012 1 1 1 2,9557 1 1 1 1,8953 1,3029 1,8735 1 336,9923
1 6,056 1 1,9023 1 1,916 1 1 1,965 1,9774 1 1 1 1 1,6376 1 1,9214 1008,2154
36,7627 53,0331 195,2172 62,0087 111,6207 16,6374 1,0101 4,9385 2,8816 1,2222 1,0253 10,2719 2,9726 5,0757 27,5894 1 1 1 1,9833 1,0006 11,865 11,3357 1,0184 1,0017 1 3,0903 1,9788 1,0115 1,9837 1,268 1 1 1 1 1 1 1
2,9095 197,428 759,0232 34,6637 248,2923 31,3587 1,0036 2,1081 1,0698 2,5168 2,8556 4,368 15,9566 3,7991 1,4506 1,9728 1,0613 2,5413 1,1076 1,0056 4,1167 1,6532 3,7906 1,8822 1,9646 3,8051 1,4071 2,5538 1,2629 1,5212 1,453 1,6616 3,0799 1,4794 1,0003 1,7549 1,1088
8,886 29,6223 397,1127 3,3941 135,6028 5,7631 1,984 1,9777 1,8873 1,6769 1,0715 1,7594 2,1345 3,7656 1,0279 1,0061 1,6329 1,1011 4,4992 1,09 3,1228 1,6644 1,0119 1,0122 1,0256 2,8075 1,0644 1,0504 1,01 1,0003 1 1,0732 1,6763 1,1386 1,0296 2,2336 1,0002
3,3966 7,2503 2,6985 170,0476 6,681 7,248 1,0014 4,8137 1,0752 1,8417 1,0476 17,9337 2,8568 5,9533 2,6637 1,0211 1,0993 1,3576 1,065 1,0109 1,3855 1,2258 1,1787 3,1313 1,0099 5,7519 1,6301 1,4382 3,7285 1,2104 1,5469 1,516 1,1605 2,5206 1,0041 1,0115 1,891
2,3012 2,1589 55,2799 12,0784 234,2928 1,6791 1 2,21 1 2,7424 1 1,2238 2,3464 1 1,0019 1 1 1 1 1 1,04 1,8393 1,0004 1 1 1,9253 1,001 1 1,0003 1 1,0001 1 1,0833 1,0739 1 1 1
20,744 85,5074 351,6685 141,8074 314,5103 21,3137 1,0008 1,952 2,0861 1 1 4,4432 6,7332 5,4062 3,2665 1 1,2066 1 10,3449 2,8928 3,47 7,2815 1 1,9726 1 6,6199 2,9185 1,946 1,0146 1 1 1,7492 1 4,7875 1,966 1 1
Table B.1: Results of first EM-clustering with 4000 instances. From ref keyword, every value is a Laplace corrected frequency.
B.2. FINAL ROUND
B.2
Final round
51
Normalised A"ribute
0 0,02
1 0,11
2 0,11
3 0,3
4 0,08
5 0,07
6 0,09
7 0,1
8 0,01
9 0,01
10 0,06
11 0,04
pos_ref mean std.@dev.
0,5861 0,4626
0,0164 0,0126
0,3759 0,2585
0,4221 0,2768
0,0207 0,0159
0,8693 0,0764
0,3426 0,2157
0,325 0,3294
0,5538 0,2031
0,4963 0,2812
0,291 0,2249
0,3144 0,2677
uitspraak_year mean std.@dev.
D6,686 3,6766
D7,5811 4,0095
D5,7347 3,4412
D7,0238 4,2052
D6,0541 3,9078
D7,7695 4,1758
D7,4319 4,1044
D5,7061 3,7811
D13,2954 1,0553
D2,5699 2,2454
D6,0998 3,566
D1,6681 0,8219
304,0098
0,031
0,072
0,425
0,131
0,452
3,556
1,193
4,688
0,585
0,097
0,019
0,902
179,0099
0,070
0,282
0,825
0,831
0,803
0,508
3,141
1,609
2,650
0,097
0,111
1,244
1075,0102
0,006
0,738
1,332
0,398
0,329
0,375
0,697
0,607
0,781
0,026
7,390
0,638
916,0101 103,0101
0,041 0,211
1,099 1,174
0,607 0,709
1,980 0,436
0,523 0,292
0,444 0,864
1,155 0,905
0,328 1,173
0,114 0,037
0,022 36,427
0,028 0,021
0,117 0,232
458,0101
0,020
0,909
1,191
1,219
0,690
1,942
1,289
0,594
1,915
0,079
0,094
0,271
33,0101
0,018
1,056
1,346
0,423
0,000
0,000
0,333
5,452
0,001
0,018
0,048
0,749
819,0101
0,010
0,397
0,652
0,618
4,621
2,516
1,420
0,162
0,025
0,059
0,006
0,200
3141,0102 13,01 104,0101
0,003 0,179 0,274
1,299 1,315 0,387
1,126 2,701 0,778
1,208 1,009 2,414
0,840 0,007 0,017
0,584 0,001 0,014
1,100 1,662 0,804
0,899 0,001 0,112
0,331 7,589 1,346
0,015 0,054 1,177
0,031 0,359 0,072
1,710 0,071 0,668
154,0099
49,538
0,058
0,000
0,000
0,022
0,002
0,010
0,000
0,000
0,001
0,001
0,000
22,0102
0,164
1,519
0,005
0,524
0,562
0,001
3,954
0,000
0,035
5,508
2,847
1,108
59,0097
0,168
3,947
0,978
0,636
0,364
0,519
0,987
0,343
0,022
1,932
0,137
1,188
161,0101 42,0099 9,0101 91,01
0,334 0,110 0,041 0,121
0,124 0,069 0,925 0,367
0,136 0,644 2,011 0,400
0,083 2,639 1,072 0,414
0,252 0,502 0,000 0,004
0,981 0,001 1,544 0,713
0,142 0,846 0,027 1,013
8,223 0,000 2,221 6,135
0,839 0,067 0,539 0,012
0,063 0,054 0,000 0,818
0,016 0,160 0,257 0,073
0,145 0,003 0,000 0,531
86,0101
0,105
1,107
2,594
0,701
0,499
0,982
1,132
0,618
0,094
0,915
0,249
2,074
13,0099 38,0101
0,002 0,125
2,011 4,502
0,000 1,801
0,638 0,433
0,958 1,214
0,000 0,013
0,003 0,407
4,612 0,135
0,000 1,614
0,091 0,224
0,794 0,082
0,000 0,064
27,0101
0,013
1,872
0,892
2,042
0,000
0,528
0,003
0,000
3,775
0,020
0,127
0,000
36,0099 57,0099 17,01
0,496 0,107 0,384
0,388 0,602 0,188
0,527 1,311 2,889
1,073 1,107 1,483
0,522 0,158 0,004
0,334 0,003 0,003
2,781 0,196 0,026
0,286 1,076 0,015
0,043 0,006 11,755
1,715 30,923 0,045
0,154 0,036 0,527
4,909 0,139 1,359
28,0101
0,170
1,213
2,334
0,684
0,456
2,867
0,033
0,002
7,280
5,710
0,507
0,003
9,01 3,0099
1,235 0,073
0,863 0,006
2,002 0,072
1,124 1,004
0,000 3,582
0,002 0,000
0,957 3,450
2,220 0,001
0,000 0,000
0,457 2,140
0,147 0,521
0,002 0,886
2,01
0,254
0,001
4,496
0,000
0,004
2,062
0,139
0,000
0,000
8,517
0,002
6,318
1,0101 8348,01 D11,99 1532,0101 6456,0101 9,0099 8036,01 D11,99 807,0099 44,0099 3667,0101 58,0101 2016,01 25,01 154,0102 45,0099 59,01 701,01 153,01 29,0102 13,0099 39,01 45,01 30,0099 8672,01 D11,99
0,015 1,122 4,170 0,036 1,218 0,081 1,004 4,170 0,001 0,076 0,000 0,004 3,885 0,036 0,002 0,030 0,009 0,002 0,004 0,025 0,301 0,034 0,021 0,319 1,236 4,170
0,000 0,952 0,758 0,318 1,112 1,728 0,960 0,758 0,006 0,080 1,713 0,049 0,001 6,206 0,062 2,688 0,138 1,225 0,026 0,030 0,910 0,782 1,015 1,411 0,945 0,758
8,995 0,970 0,758 0,917 0,996 0,006 0,978 0,758 0,000 1,726 0,000 0,015 0,001 0,243 8,760 5,976 4,002 7,792 1,811 0,032 1,775 0,209 0,099 0,562 0,962 0,758
0,000 0,986 0,278 0,957 1,032 0,385 1,013 0,278 0,000 0,015 2,211 0,002 0,000 0,034 0,001 0,011 0,006 0,000 0,004 0,005 0,244 0,029 0,013 0,044 0,959 0,278
0,000 1,018 1,043 1,578 0,886 0,084 1,017 1,043 0,004 2,254 0,001 0,037 3,665 0,114 0,044 0,073 4,401 0,028 1,518 0,042 0,580 0,448 0,045 0,105 1,019 1,043
0,000 0,926 1,191 0,323 1,055 1,440 0,916 1,191 0,002 0,188 0,005 0,015 3,365 0,000 0,008 0,191 0,971 0,007 0,004 0,889 0,420 0,648 1,108 5,650 0,936 1,191
0,001 1,080 0,927 0,676 1,186 0,002 1,086 0,927 0,000 0,115 0,000 0,009 4,087 1,178 0,006 0,082 0,037 0,003 0,053 0,044 1,429 4,768 1,335 0,176 1,074 0,927
0,001 1,034 0,834 0,817 1,087 4,452 1,042 0,834 9,979 0,036 0,000 0,008 0,001 0,095 0,009 0,036 0,024 0,002 0,012 5,922 1,822 0,102 0,114 0,652 1,026 0,834
0,000 0,886 8,340 0,050 0,670 10,972 0,597 8,340 0,058 1,657 0,012 0,000 0,012 5,405 0,054 0,214 8,894 0,241 0,738 0,003 11,647 15,133 1,117 0,274 1,165 8,340
0,000 1,129 8,340 4,224 0,006 1,687 0,850 8,340 0,002 0,357 1,443 0,947 0,238 3,513 0,258 0,088 0,061 0,052 0,023 0,038 1,003 0,033 0,026 8,369 1,398 8,340
0,003 1,054 1,390 1,016 1,047 0,006 1,041 1,390 0,001 0,059 2,216 0,023 0,001 0,023 0,007 0,143 0,054 0,005 0,017 0,026 1,851 0,053 1,119 0,038 1,067 1,390
0,000 1,002 2,085 4,533 0,107 0,273 0,960 2,085 0,006 14,098 0,004 24,392 0,573 1,603 0,505 0,027 0,599 0,037 16,506 8,177 0,011 5,153 14,543 4,734 1,042 2,085
396,0099
0,000
0,001
0,000
0,000
0,000
0,000
0,000
9,999
0,001
0,000
0,000
0,000
17,0099
0,000
6,692
0,000
0,875
0,000
0,004
0,000
0,000
0,019
0,000
0,006
0,000
23,01
0,000
0,876
0,001
2,673
0,000
0,001
0,000
0,001
0,000
1,376
1,446
0,014
ref_keyword (in/met)@strijd(ige)@ (oplevert/gehandeld)@ (met) in@de@zin@van (als/in)@bedoeld(e)@(in/ onder) op@grond@van gelet@op (in@xe2x80xa6)@(is)@ bepaald(e) moet@xe2x80xa6@getoetst@ worden met/om@toepassing@van/ aan ingevolge op@basis@van krachtens xe2x80xa6@(is)@niet@van@ toepassing@(is) in@afwijking@van in@samenhang@met@ xe2x80xa6 schending@van (niet)@op@de@voet@van blijkens (zich/met)@beroep(en)@op (in@xe2x80xa6@is)@ neergelegd@(in) inzake juncto@(/jo) als@genoemd@in@ xe2x80xa6 bevat@de@bevoegdheid voldoen@aan luidt@xe2x80xa6@: (in@xe2x80xa6)@(is)@ opgenomen@(in@ xe2x80xa6) volgt@uit in@xe2x80xa6@beschreven in@xe2x80xa6@ voorgeschreven (als)@vermeld@(in/onder) [total] uitspraak_hof Raad@van@State Rechtbank Gerechtshof [total] ref_bwb BWBV0001000 BWBR0003738 BWBR0011823 BWBV0001506 BWBR0005537 BWBR0006685 BWBV0001002 BWBR0001854 BWBR0017959 BWBR0011825 BWBR0002757 BWBV0004701 BWBR0001840 BWBR0012002 BWBR0007149 BWBV0002508 [total] ref_url h"p://doc.metalex.eu: 8080/page/id/ BWBV0001000/arXkel/3 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/ 34a h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/62 h"p://doc.metalex.eu: 8080/page/id/ BWBV0001506/arXkel/20 h"p://doc.metalex.eu: 8080/page/id/ BWBV0001000/arXkel/8 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/8 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/33 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/29 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 8%3A74 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 7%3A2 h"p://doc.metalex.eu: 8080/page/id/ BWBV0001002/arXkel/1F h"p://doc.metalex.eu: 8080/page/id/ BWBR0017959/arXkel/17
Totaal
1
15,0098
0,013
0,186
0,027
0,005
0,128
0,050
0,023
0,019
0,000
0,019
0,004
23,896
311,0101
0,000
0,000
0,000
0,000
0,000
0,000
0,000
9,998
0,002
0,000
0,000
0,000
100,0098
0,000
1,050
0,000
1,845
0,000
0,000
0,000
0,000
0,005
0,148
5,489
0,001
17,0099
0,000
0,811
0,001
0,620
0,000
0,000
0,001
0,000
0,000
0,004
12,066
0,000
495,0099
0,000
2,267
0,000
2,388
0,000
0,000
0,000
0,000
0,000
0,017
0,568
0,000
54,0101
0,004
0,000
0,000
0,000
2,989
10,839
0,020
0,000
0,001
0,000
0,000
0,000
19,0101
0,064
0,001
0,000
0,000
0,491
9,521
3,232
0,002
0,002
0,003
0,000
0,030
61,01
0,001
0,050
9,011
0,000
0,020
0,013
0,001
0,001
0,001
0,000
0,003
0,003
19,0101
0,004
0,041
6,435
0,002
3,218
0,007
0,019
0,022
0,000
0,067
0,069
0,493
1
B.2. FINAL ROUND A"ribute h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/50 h"p://doc.metalex.eu: 8080/page/id/ BWBV0001000/arXkel/5 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 6%3A6 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/3 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 8%3A81 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 3%3A2 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 8%3A82 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/64 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 8%3A54 h"p://doc.metalex.eu: 8080/page/id/ BWBR0002757/arXkel/2 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/85 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 8%3A86 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/59 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 4%3A8 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 8%3A70 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/1 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/31 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/14 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/9 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/19 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/47 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/91 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011825/arXkel/ 3.4 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011825/arXkel/ 3.71 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 8%3A75 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/28 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/94 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/ 106 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011825/arXkel/ 3.13 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 3%3A46 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 8%3A77 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/67
Totaal
53
1
0 0,02
1 0,11
2 0,11
3 0,3
4 0,08
5 0,07
6 0,09
7 0,1
8 0,01
9 0,01
10 0,06
11 0,04
64,0099
0,000
0,771
0,000
2,920
0,000
0,000
0,000
0,000
0,009
0,095
0,630
0,002
25,0099
0,000
0,002
0,001
0,000
0,000
0,007
0,000
9,969
0,010
0,006
0,001
0,040
155,0098
48,787
0,000
0,000
0,000
0,178
0,063
0,061
0,000
0,002
0,001
0,000
0,001
51,01
0,000
1,273
0,000
2,801
0,001
0,000
0,000
0,000
0,000
0,842
0,176
0,007
114,01
0,007
0,000
0,000
0,000
0,584
2,514
8,628
0,000
0,002
0,000
0,000
0,012
70,0099
0,025
0,000
0,000
0,000
1,449
5,472
5,006
0,000
0,004
0,552
0,000
1,105
20,0099
0,004
0,000
0,000
0,000
3,737
9,984
0,017
0,000
0,000
0,000
0,000
0,000
80,0101
0,000
1,666
0,000
2,655
0,002
0,001
0,000
0,000
0,001
0,189
0,298
0,001
152,0101
0,001
0,000
0,000
0,000
11,941
0,000
0,493
0,000
0,000
0,001
0,000
0,007
43,01
0,001
0,012
1,620
0,001
1,624
0,000
0,012
0,005
0,834
0,021
0,026
16,961
155,01
0,000
3,135
0,000
1,873
0,000
0,001
0,000
0,000
0,000
9,308
0,002
0,001
68,01
0,007
0,000
0,000
0,000
0,218
2,907
8,648
0,000
0,001
0,001
0,000
0,010
228,0103
0,000
0,976
0,000
2,952
0,000
0,000
0,000
0,000
0,000
0,048
0,105
0,001
16,0101
0,142
0,001
0,001
0,000
1,669
0,040
7,048
0,002
0,027
0,304
0,001
5,553
34,0099
0,004
0,002
0,000
0,000
12,094
0,417
0,030
0,000
0,000
0,000
0,000
0,000
55,0099
0,000
4,083
0,000
1,687
0,001
0,001
0,000
0,000
0,005
0,013
0,734
0,003
274,0099
0,000
0,852
0,000
2,953
0,000
0,000
0,000
0,000
0,000
0,030
0,331
0,001
172,01
0,000
0,718
0,000
0,864
0,000
0,000
0,000
0,000
0,000
0,009
11,027
0,002
21,01
0,000
0,469
0,000
1,058
0,001
0,003
0,000
0,000
0,003
0,053
10,492
0,003
17,0098
0,000
1,270
0,000
2,853
0,000
0,000
0,001
0,001
0,066
0,024
0,036
0,016
1,01
0,000
5,281
0,004
0,803
0,028
0,000
0,002
0,002
0,000
4,257
2,046
0,000
39,01
0,000
0,192
0,000
0,485
0,000
0,001
0,000
0,000
0,006
83,242
0,009
0,000
43,0099
0,001
1,096
7,973
0,000
0,015
0,001
0,000
0,000
0,021
0,018
0,001
0,008
102,01
0,000
0,895
8,187
0,000
0,009
0,001
0,000
0,000
0,002
0,001
0,000
0,002
233,0101
0,001
0,000
0,000
0,000
7,804
5,049
0,244
0,000
0,000
0,004
0,000
0,005
229,0101
0,000
0,079
0,000
0,153
0,000
0,000
0,000
0,000
0,000
0,001
15,757
0,000
114,0099
0,000
1,481
0,000
2,769
0,000
0,000
0,000
0,000
0,001
0,524
0,014
0,002
63,01
0,000
2,125
0,000
2,465
0,001
0,007
0,000
0,000
0,002
0,086
0,419
0,001
15,01
0,000
0,249
8,818
0,000
0,004
0,000
0,001
0,001
0,002
0,003
0,004
0,028
47,0099
0,031
0,000
0,000
0,000
0,011
11,880
0,591
0,001
0,000
0,036
0,000
2,826
41,0099
0,003
0,001
0,000
0,000
11,484
0,350
0,382
0,000
0,007
0,001
0,000
0,545
116,01
0,000
1,533
0,000
2,709
0,000
0,000
0,000
0,000
0,000
0,056
0,297
0,001
2
54 A"ribute h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 4%3A6 h"p://doc.metalex.eu: 8080/page/id/ BWBV0004701/arXkel/41 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/30 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/11 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/6 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/16 h"p://doc.metalex.eu: 8080/page/id/ BWBV0001002/arXkel/ 1%28F%29 h"p://doc.metalex.eu: 8080/page/id/ BWBV0001000/arXkel/6 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/24 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 8%3A83 h"p://doc.metalex.eu: 8080/page/id/ BWBV0001506/arXkel/45 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/13 h"p://doc.metalex.eu: 8080/page/id/ BWBR0002757/arXkel/15 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/45 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 8%3A84 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011825/arXkel/ 4.21 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 1%3A3 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/26 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011825/arXkel/ 4.17a h"p://doc.metalex.eu: 8080/page/id/ BWBR0011825/arXkel/ 3.6 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 6%3A5 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 8%3A72 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 8%3A29 h"p://doc.metalex.eu: 8080/page/id/ BWBR0007149/arXkel/2 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/96 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 7%3A3 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/72 h"p://doc.metalex.eu: 8080/page/id/ BWBR0002757/arXkel/ 19d h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/15 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 4%3A84 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/69 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011825/arXkel/ 3.86
APPENDIX B. RAW RESULTS Totaal
1
0 0,02
1 0,11
2 0,11
3 0,3
4 0,08
5 0,07
6 0,09
7 0,1
8 0,01
9 0,01
10 0,06
11 0,04
120,0101
0,009
0,000
0,000
0,000
1,988
1,618
8,080
0,000
0,001
0,001
0,000
0,005
26,01
0,016
0,033
0,034
0,005
0,047
0,813
0,046
5,980
0,003
0,009
0,029
8,141
44,0099
0,000
0,706
0,000
3,006
0,000
0,000
0,000
0,000
0,001
0,007
0,334
0,001
30,0099
0,000
0,166
0,000
3,260
0,000
0,000
0,000
0,000
0,013
0,000
0,055
0,000
109,01
0,000
0,506
0,000
2,995
0,000
0,001
0,000
0,000
0,002
0,040
0,756
0,003
61,0098
0,000
0,437
0,000
3,046
0,000
0,002
0,000
0,000
0,011
0,015
0,623
0,003
64,01
0,000
0,037
9,025
0,000
0,020
0,002
0,001
0,001
0,001
0,005
0,002
0,020
22,0099
0,001
0,006
0,001
0,000
0,028
0,002
0,001
9,949
0,015
0,003
0,002
0,026
14,0101
0,002
0,001
0,001
3,303
0,000
0,002
0,001
0,000
0,000
0,029
0,120
0,012
29,01
0,014
0,000
0,001
0,000
0,675
2,087
8,783
0,000
0,001
0,016
0,005
0,205
14,0099
0,000
0,000
0,002
0,000
0,000
0,002
0,001
0,001
0,000
3,527
0,000
24,087
30,0099
0,000
0,766
0,000
2,987
0,002
0,000
0,000
0,000
0,001
0,039
0,309
0,005
12,0099
0,001
0,019
0,839
0,002
3,903
0,006
0,020
0,010
0,000
0,033
0,001
14,708
25,0099
0,000
3,237
0,000
2,085
0,001
0,001
0,000
0,000
0,008
0,006
0,291
0,001
19,01
0,017
0,002
0,001
0,000
10,846
0,000
1,446
0,000
0,000
0,000
0,009
0,010
23,0098
0,000
0,066
9,006
0,000
0,003
0,000
0,000
0,000
0,044
0,003
0,007
0,010
33,01
0,030
0,000
0,000
0,000
0,590
0,719
9,846
0,000
0,009
0,009
0,002
0,376
30,0099
0,000
0,326
0,000
3,204
0,001
0,002
0,000
0,000
0,018
0,006
0,030
0,005
21,0099
0,000
0,480
8,513
0,000
0,014
0,000
0,000
0,000
0,000
0,545
0,000
0,089
20,01
0,001
0,537
8,525
0,000
0,008
0,005
0,001
0,001
0,032
0,019
0,002
0,020
93,0101
0,005
0,000
0,000
0,000
5,752
6,442
0,894
0,000
0,004
0,000
0,000
0,204
50,0101
0,012
0,000
0,000
0,000
2,560
7,919
2,614
0,000
0,001
0,020
0,000
0,123
13,0101
0,211
0,000
0,001
0,000
0,055
1,416
9,782
0,001
0,018
0,032
0,003
0,255
17,0099
0,015
1,207
0,058
0,006
0,056
0,005
1,034
0,177
0,000
0,042
0,038
18,499
37,0099
0,000
2,432
0,000
2,400
0,000
0,001
0,000
0,000
0,006
0,002
0,201
0,000
54,0099
0,033
0,000
0,000
0,000
0,705
7,189
4,786
0,000
0,005
0,229
0,000
0,161
45,0102
0,000
2,437
0,000
2,310
0,000
0,001
0,000
0,000
0,001
0,004
0,639
0,000
22,01
0,000
0,062
0,054
0,000
2,236
0,000
0,002
0,002
0,000
0,000
0,000
20,183
12,0101
0,023
0,000
0,002
0,000
4,525
0,007
6,985
0,001
0,000
0,419
0,014
0,051
63,01
0,072
0,001
0,000
0,000
2,618
2,537
6,755
0,000
0,010
0,061
0,001
0,062
28,01
0,000
2,467
0,000
2,416
0,001
0,002
0,000
0,000
0,000
0,019
0,046
0,007
15,0099
0,000
0,723
8,300
0,000
0,076
0,001
0,001
0,001
0,000
0,030
0,001
0,003
3
B.2. FINAL ROUND A"ribute h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/ 118 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/83 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/ 66a h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 6%3A7 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011825/arXkel/ 3.107 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/18 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/ 33a h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/84 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/ 33b h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/17 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/7a h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 7%3A12 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/15 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/ 16a h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/ 15c h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/35 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011825/arXkel/14 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/73 h"p://doc.metalex.eu: 8080/page/id/ BWBR0002757/arXkel/18 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/ 117 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/71 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/32 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 4%3A5 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011825/arXkel/ 3.52 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 8%3A68 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011825/arXkel/ 3.109 h"p://doc.metalex.eu: 8080/page/id/ BWBV0002508/arXkel/3 h"p://doc.metalex.eu: 8080/page/id/ BWBR0002757/arXkel/ 19a h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/ h"p://doc.metalex.eu: 8080/page/id/ BWBR0011825/arXkel/ 3.30 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011825/arXkel/ 3.106 h"p://doc.metalex.eu: 8080/page/id/ BWBR0011825/arXkel/ 6.5a
Totaal
55
1
0 0,02
1 0,11
2 0,11
3 0,3
4 0,08
5 0,07
6 0,09
7 0,1
8 0,01
9 0,01
10 0,06
11 0,04
15,0101
0,000
1,059
0,000
2,643
0,000
0,004
0,001
0,000
0,021
0,000
1,488
0,000
62,01
0,000
1,261
0,000
2,797
0,000
0,001
0,000
0,000
0,001
0,101
0,346
0,001
32,0101
0,000
0,501
0,000
2,871
0,002
0,000
0,000
0,000
0,000
0,665
1,252
0,030
12,01
0,157
0,000
0,001
0,000
0,000
1,354
9,644
0,001
0,038
0,755
0,001
0,628
12,0101
0,002
5,344
3,723
0,000
0,013
0,005
0,001
0,001
0,001
0,001
0,000
0,001
15,0099
0,000
0,594
0,001
2,548
0,000
0,001
0,000
0,000
0,000
0,114
2,789
0,024
37,0099
0,000
8,456
0,000
0,228
0,000
0,002
0,000
0,000
0,045
0,000
0,011
0,000
12,0101
0,004
4,422
0,000
1,698
0,001
0,012
0,000
0,000
0,002
0,054
0,027
0,003
16,0098
0,001
0,000
0,000
3,304
0,000
0,090
0,001
0,000
0,030
0,000
0,026
0,000
50,0101
0,000
3,323
0,000
2,074
0,000
0,000
0,000
0,000
0,000
0,130
0,176
0,004
10,0102
0,000
0,818
0,000
3,017
0,000
0,004
0,001
0,000
0,045
0,041
0,031
0,018
46,0099
0,072
0,000
0,000
0,000
0,972
10,895
1,721
0,002
0,003
0,031
0,001
0,062
58,0099
0,000
2,604
0,000
1,684
0,000
0,000
0,000
0,000
0,018
0,248
3,423
0,005
26,01
0,000
3,652
0,000
1,987
0,000
0,002
0,000
0,000
0,058
0,000
0,015
0,000
16,01
0,000
0,023
0,000
3,319
0,000
0,004
0,001
0,000
0,027
0,000
0,009
0,000
13,0097
0,000
1,188
0,000
2,737
0,001
0,000
0,001
0,000
0,005
0,050
0,777
0,007
12,01
0,000
1,667
7,394
0,000
0,006
0,000
0,001
0,000
0,000
0,003
0,018
0,018
13,0099
0,000
1,971
0,000
2,604
0,001
0,003
0,001
0,000
0,010
0,000
0,014
0,000
12,0099
0,001
0,052
1,803
0,003
0,760
0,000
0,032
0,015
0,000
0,002
0,002
18,221
16,0098
0,000
1,441
0,000
2,799
0,000
0,000
0,001
0,000
0,000
0,000
0,018
0,000
75,0099
0,000
8,958
0,000
0,048
0,000
0,000
0,000
0,000
0,002
0,000
0,002
0,000
38,0101
0,000
0,695
0,000
2,952
0,000
0,001
0,000
0,000
0,009
0,026
0,622
0,001
48,0099
0,973
0,000
0,000
0,000
0,883
1,512
8,927
0,000
0,004
0,022
0,000
0,002
14,0102
0,002
0,705
8,352
0,000
0,005
0,007
0,002
0,003
0,000
0,014
0,002
0,034
12,0101
0,184
0,000
0,001
0,000
1,317
0,000
9,878
0,000
0,000
0,002
0,000
0,021
13,0099
0,003
2,020
7,026
0,000
0,034
0,001
0,002
0,006
0,013
0,001
0,000
0,008
15,0102
0,437
0,021
0,623
0,072
0,022
5,438
0,274
1,073
0,547
12,799
0,028
6,215
21,0101
0,001
0,040
1,864
0,002
0,303
0,000
0,020
0,008
0,000
0,001
0,063
18,976
18,0098
0,000
2,863
0,000
2,274
0,001
0,000
0,000
0,000
0,003
0,039
0,016
0,022
16,01
0,000
0,000
9,024
0,000
0,000
0,000
0,001
0,001
0,000
0,443
0,001
0,051
14,0099
0,002
1,321
7,739
0,000
0,007
0,004
0,001
0,000
0,051
0,027
0,002
0,015
13,0102
0,002
0,731
8,322
0,000
0,012
0,003
0,001
0,002
0,000
0,015
0,000
0,043
4
56 A"ribute h"p://doc.metalex.eu: 8080/page/id/ BWBR0011823/arXkel/82 h"p://doc.metalex.eu: 8080/page/id/ BWBR0005537/arXkel/ 6%3A20 [total]
APPENDIX B. RAW RESULTS Totaal
1
0 0,02
1 0,11
2 0,11
3 0,3
4 0,08
5 0,07
6 0,09
7 0,1
8 0,01
9 0,01
10 0,06
11 0,04
11,0099
0,000
2,003
0,001
2,530
0,003
0,001
0,001
0,000
0,057
0,043
0,304
0,003
12,0099
0,098
0,001
0,001
0,000
1,588
0,067
9,604
0,000
0,070
0,002
0,000
0,002
16998,7838
1304,1614
1560,207
1576,8812
3154,8097
1366,0593
1227,3901
1497,3759
1548,9562
759,9996
780,2695
1214,1206
1020,5433
Time@taken@to@build@ model@(full@training@ data)@:@592.75@seconds
Clustered@Instances 0@@@@@@@155@(@@2%) 1@@@@@@@921@(@12%) 2@@@@@@@856@(@11%) 3@@@@@@2406@(@30%) 4@@@@@@@669@(@@8%) 5@@@@@@@522@(@@7%) 6@@@@@@@758@(@@9%) 7@@@@@@@831@(@10%) 8@@@@@@@@45@(@@1%) 9@@@@@@@@64@(@@1%) 10@@@@@@@463@(@@6%) 11@@@@@@@310@(@@4%) Log@likelihood:@D10.69442
5
Appendix C
Example of generated Ecli file
57
58
APPENDIX C. EXAMPLE OF GENERATED ECLI FILE
This XML file does not appear to have any style information associated with it. The document tree is shown below. ECLI:NL:RBSGR:2001:AE0209 text/xml public 2013-04-04T17:38:59 2013-04-04 Raad voor de Rechtspraak nl AE0209 Rechtbank 'sGravenhage 2001-12-17 00/72045 OVERIO GR Uitspraak Bodemzaak Eerste aanleg - meervoudig NL Bestuursrecht; Vreemdelingenrecht Rechtspraak.nl Artikel 119 Vw 2000 Met betrekking tot het procedurele recht overweegt de rechtbank het volgende. Artikel 119 Vw 2000 beperkt de toepassing van het recht dat gold vr invoering van deze wet uitsluitend tot de mogelijkheid om beroep in te stellen, het griffierecht en de schorsende werking, zodat voor het overige het nieuwe recht van toepassing is. De rechtbank dient dus met ingang van 1 april 2001 bij de beoordeling van het beroep toepassing te geven aan het bepaalde in artikel 83 Vw 2000 en rekening te houden met feiten en omstandigheden die na het nemen van de bestreden beschikkingen zijn opgekomen, tenzij de goede procesorde zich daartegen verzet of de afdoening van de zaak daardoor ontoelaatbaar wordt vertraagd. artikel 83 Vw 2000 Met betrekking tot het procedurele recht overweegt de rechtbank het volgende. Artikel 119 Vw 2000 beperkt de toepassing van het recht dat gold vr invoering van deze wet uitsluitend tot de mogelijkheid om beroep in te stellen, het griffierecht en de schorsende werking, zodat voor het overige het nieuwe recht van toepassing is. De rechtbank dient dus met ingang van 1 april 2001 bij de beoordeling van het beroep toepassing te geven aan het bepaalde in artikel 83 Vw 2000 en rekening te houden met feiten en omstandigheden die na het nemen van de bestreden beschikkingen zijn opgekomen, tenzij de goede procesorde zich daartegen verzet of de afdoening van de zaak daardoor ontoelaatbaar wordt vertraagd. artikel 15 Vreemdelingenwet (Vw)
59
2.2 Op grond van artikel 15 Vreemdelingenwet (Vw) in samenhang met artikel 1(A) van het Verdrag van Genve betreffende de status van vluchtelingen kunnen vreemdelingen die afkomstig zijn uit een land waarin zij gegronde reden hebben te vrezen voor vervolging wegens hun godsdienstige, levensbeschouwelijke of politieke overtuiging of hun nationaliteit, dan wel wegens het behoren tot een bepaald ras of een bepaalde sociale groep, als vluchteling worden toegelaten. http://deeplink.rechtspraak.nl/uitspraak?id=ECLI:NL:RBSGR:2001:AE0209 text/html public 2013-04-04T17:38:59 2002-03-15 Raad voor de Rechtspraak nl ECLI:NL:RBSGR:2001:AE0209 Rechtbank 's-Gravenhage , 17-12-2001 / 00/72045 OVERIO GR <preserve:inhoudsindicatie id="ECLI:NL:RBSGR:2001:AE0209:INH" lang="nl" xml:space="preserve"> <preserve:parablock> <preserve:para> Azerbeidzjan / Nagorno-Karabach / vestigingsalternatief Armenië. <preserve:para> De rechtbank is van oordeel dat de bestreden beschikkingen op onzorgvuldige wijze zijn gemotiveerd. Hiertoe is allereerst redengevend dat verweerder in de bestreden beschikkingen heeft overwogen dat het onaannemelijk is dat er na 1992 nog gemengd gehuwden, zoals eisers, in Nagorno-Karabach voorkomen. Verweerder heeft zich hierbij gebaseerd op genoemd ambtsbericht d.d. 28 december 1999, waarin onder meer staat vermeld dat Nagorno-Karabach uitsluitend bewoond wordt door etnische Armeniërs en dat etnische Azeri, echtparen in een gemengd huwelijk, alsmede de eventuele kinderen daarvan, NagornoKarabach tussen 1988 en 1992 hebben verlaten. <preserve:para> De rechtbank kan verweerder hierin niet volgen. De rechtbank leest deze passage in het ambtsbericht niet zo absoluut als verweerder en acht het niet volledig uitgesloten dat er - weliswaar in uiterst beperkte aantallen - toch nog gemengd gehuwden voorkomen in Nagorno-Karabach. Het voert de rechtbank in ieder geval te ver om uitsluitend op grond van deze passage in het ambtsbericht het relaas van eisers als ongeloofwaardig af te doen. <preserve:para> De rechtbank is voorts van oordeel dat verweerder in de bestreden beschikkingen zich ten onrechte op het standpunt heeft gesteld dat eisers een vestigingsalternatief hebben in Armenië. Verweerder heeft gesteld dat eisers kunnen opteren voor het Armeense staatsburgerschap. De rechtbank is evenwel van oordeel dat deze omstandigheid geen zelfstandige afwijzingsgrond voor toelating in Nederland kan vormen. Vluchtelingschap dient allereerst bezien te worden in relatie tot het land waarvan de vreemdeling de nationaliteit bezit, dan wel het land van bestendig verblijf. Armenië is niet als zodanig te beschouwen ten opzichte van eisers. Ook de in de Vw geregelde situatie van land van eerder verblijf is in het geval van eisers niet van toepassing nu zij vanuit NagornoKarabach zijn vertrokken naar Rusland. Beroep gegrond. <preserve:uitspraak id="ECLI:NL:RBSGR:2001:AE0209:DOC" lang="nl" xml:space="preserve"> <preserve:parablock> <preserve:para>ARRONDISSEMENTSRECHTBANK TE 's-GRAVENHAGE <preserve:para>Zittingsplaats Zwolle <preserve:para>Vreemdelingenkamer <preserve:para/> <preserve:para>regnr.: Awb 00/72045 OVERIO GR <preserve:para/> <preserve:para>UITSPRAAK <preserve:para/> <preserve:parablock> <preserve:para>inzake: A,
60
APPENDIX C. EXAMPLE OF GENERATED ECLI FILE
<preserve:para>geboren op [...] 1959, <preserve:para>alsmede diens echtgenote B, <preserve:para>geboren op [...] 1959, <preserve:para>mede ten behoeve van hun minderjarige kinderen, <preserve:para>verblijvende te C, <preserve:para>van Azerbeidzjaanse nationaliteit, <preserve:para>IND dossiernummer 9911.11.2033, <preserve:para>eisers, <preserve:para> gemachtigde: mr. T.J.M. Wijngaard, advocaat te Den Bosch; <preserve:para/> <preserve:parablock> <preserve:para>tegen: DE STAATSSECRETARIS VAN JUSTITIE <preserve:para>(Immigratie- en Naturalisatiedienst), <preserve:para>te 's-Gravenhage, <preserve:para>verweerder, <preserve:para> gemachtigde: mr. E.E. van der Kamp, te 's-Gravenhage. <preserve:para/> <preserve:para>1 PROCESVERLOOP <preserve:para/> <preserve:para> 1.1 Op 12 november 1999 hebben eisers aanvragen om toelating als vluchteling ingediend. Bij beschikkingen van 8 juni 2000, uitgereikt op 3 juli 2000, heeft verweerder de aanvragen niet ingewilligd en ambtshalve beslist aan eisers geen vergunning tot verblijf op grond van klemmende redenen van humanitaire aard te verlenen. <preserve:para/> <preserve:para> 1.2 Eisers hebben daartegen bij brief van 27 juli 2000 bezwaar gemaakt. Bij beschikkingen van 21 november 2000 heeft verweerder het bezwaar ongegrond verklaard. <preserve:para/> <preserve:para> 1.3 Bij beroepschrift van 23 november 2000 hebben eisers beroep ingesteld tegen deze beschikkingen. Het beroep is ter zitting van 23 oktober 2001 behandeld. Eiser is daarbij verschenen, bijgestaan door zijn gemachtigde. Verweerder heeft zich doen vertegenwoordigen. <preserve:para/> <preserve:para>2 OVERWEGINGEN <preserve:para/> <preserve:parablock> <preserve:para> 2.1 In deze procedure dient te worden beoordeeld of de bestreden beschikkingen toetsing aan geschreven en ongeschreven rechtsregels kunnen doorstaan. <preserve:para> Op 1 april 2001 is de Vreemdelingenwet 2000 (Vw 2000) in werking getreden en is de Vreemdelingenwet (Vw) ingetrokken. De bestreden beschikkingen zijn bekendgemaakt vóór de inwerkingtreding van deze wet. Derhalve toetst de rechtbank de rechtmatigheid van de beschikkingen aan de bepalingen van de Vw. <preserve:para> Met betrekking tot het procedurele recht overweegt de rechtbank het volgende. Artikel 119 Vw 2000 beperkt de toepassing van het recht dat gold vóór invoering van deze wet uitsluitend tot de mogelijkheid om beroep in te stellen, het griffierecht en de schorsende werking, zodat voor het overige het nieuwe recht van toepassing is. De rechtbank dient dus met ingang van 1 april 2001 bij de beoordeling van het beroep toepassing te geven aan het bepaalde in artikel 83 Vw 2000 en rekening te houden met feiten en omstandigheden die na het nemen van de bestreden beschikkingen zijn opgekomen, tenzij de goede procesorde zich daartegen verzet of de afdoening van de zaak daardoor ontoelaatbaar wordt vertraagd. <preserve:para/> <preserve:para> 2.2 Op grond van artikel 15 Vreemdelingenwet (Vw) in samenhang met artikel 1(A) van het Verdrag van Genève betreffende de status van vluchtelingen kunnen vreemdelingen die afkomstig zijn uit een land waarin zij gegronde reden hebben te vrezen voor vervolging wegens hun godsdienstige, levensbeschouwelijke of politieke overtuiging of hun nationaliteit, dan wel wegens het behoren tot een bepaald ras of een bepaalde sociale groep, als vluchteling worden toegelaten. <preserve:para/>
61
<preserve:parablock> <preserve:para> 2.3 Het vluchtrelaas van eisers komt op het volgende neer. <preserve:para> Eiser behoort tot de etnisch Armeense bevolkingsgroep, eiseres tot de etnisch Azerbeidzjaanse bevolkingsgroep. Eisers zijn afkomstig uit Nagorny-Karabach. Vanwege de etnische afkomst van eiseres en hun gemengde huwelijk hebben eisers problemen ondervonden. Eiser is in november 1998, vanwege de afkomst van zijn echtgenote, beschuldigd van het leveren van wapens aan Azeri. Eiser is verhoord door militairen, waarbij zij twee van zijn vingers hebben gebroken. Eiser is vervolgens naar het ziekenhuis gebracht. Na twee dagen is eiser het ziekenhuis uit gelopen en direct met zijn gezin naar Minwod in de Russische Federatie vertrokken. Omdat zij niet over documenten beschikten, kregen zij ook daar problemen. Uiteindelijk zijn eisers naar Nederland vertrokken. <preserve:para/> <preserve:parablock> <preserve:para> 2.4 Verweerder heeft de aanvragen afgewezen, omdat geen aanleiding bestaat aan te nemen dat eisers als gemengd gehuwden na 1992 nog in Nagorny-Karabach hebben verbleven. Gewezen wordt op het ambtsbericht van de Minister van Buitenlandse Zaken d.d. 28 december 1999 (kenmerk DPC/AM-627435). Uit dit ambtsbericht blijkt immers dat de enclave NagornyKarabach uitsluitend bewoond wordt door etnische Armenen en dat er geen etnische Azeri meer voorkomen. De stelling van eisers dat er, gelet op een brief van VVN Rijnmond d.d. 4 mei 2000, nog wel gemengd gehuwden voorkomen in Nagorny-Karabach kan niet tot een ander oordeel leiden. <preserve:para> Voorts bestaat er een vestigingsalternatief voor eisers in Armenië. De inhoud van genoemd ambtsbericht duidt er niet op dat personen met een gemengde afkomst zoals eiseres, en gemengd gehuwden, deswege noemenswaardige problemen in Armenië ondervinden. <preserve:para/> <preserve:para> 2.5 Eisers stellen dat verweerder ten onrechte heeft gesteld dat in Nagorny-Karabach geen gemengd gehuwden meer voorkomen en dat eisers een vestigingsalternatief in Armenië hebben. Gelet hierop zijn de bestreden beschikkingen op onzorgvuldige wijze tot stand gekomen en berusten zij niet op een deugdelijke motivering. Bovendien is ten onrechte niet voldaan aan de hoorplicht. <preserve:para/> <preserve:para> 2.6 Vooropgesteld moet worden, dat niet is gebleken dat de politieke en mensenrechtensituatie in Azerbeidzjan (en in het bijzonder Nagorny-Karabach) zodanig is dat asielzoekers uit dat land zonder meer als vluchteling behoren te worden aangemerkt. Derhalve zal aannemelijk moeten zijn dat er met betrekking tot eisers persoonlijk feiten en omstandigheden bestaan waardoor zij gegronde reden hebben te vrezen voor vervolging in vluchtelingenrechtelijke zin. <preserve:para/> <preserve:parablock> <preserve:para> 2.7 De rechtbank is van oordeel dat de bestreden beschikkingen op onzorgvuldige wijze zijn gemotiveerd. Hiertoe is allereerst redengevend dat verweerder in de bestreden beschikkingen heeft overwogen dat het onaannemelijk is dat er na 1992 nog gemengd gehuwden, zoals eisers, in Nagorny-Karabach voorkomen. Verweerder heeft zich hierbij gebaseerd op genoemd ambtsbericht d.d. 28 december 1999, waarin onder meer staat vermeld dat Nagorny-Karabach uitsluitend bewoond wordt door etnische Armeniërs en dat etnische Azeri, echtparen in een gemengd huwelijk, alsmede de eventuele kinderen daarvan, NagornyKarabach tussen 1988 en 1992 hebben verlaten. <preserve:para> De rechtbank kan verweerder hierin niet volgen. De rechtbank leest deze passage in het ambtsbericht niet zo absoluut als verweerder en acht het niet volledig uitgesloten dat er -weliswaar in uiterst beperkte aantallen- toch nog gemengd gehuwden voorkomen in NagornyKarabach. Het voert de rechtbank in ieder geval te ver om uitsluitend op grond van deze passage in het ambtsbericht het relaas van eisers als ongeloofwaardig af te doen. <preserve:para/> <preserve:parablock> <preserve:para> De rechtbank is voorts van oordeel dat verweerder in de bestreden beschikkingen zich ten onrechte op het standpunt heeft gesteld dat eisers een vestigingsalternatief hebben in Armenië. Verweerder heeft gesteld dat eisers kunnen opteren voor het Armeense
62
APPENDIX C. EXAMPLE OF GENERATED ECLI FILE
staatsburgerschap. De rechtbank is evenwel van oordeel dat deze omstandigheid geen zelfstandige afwijzingsgrond voor toelating in Nederland kan vormen. Vluchtelingschap dient allereerst bezien te worden in relatie tot het land waarvan de vreemdeling de nationaliteit bezit, dan wel het land van bestendig verblijf. Armenië is niet als zodanig te beschouwen ten opzichte van eisers. Ook de in de Vw geregelde situatie van land van eerder verblijf is in het geval van eisers niet van toepassing nu zij vanuit NagornyKarabach zijn vertrokken naar Rusland. <preserve:para> Overigens merkt de rechtbank naar aanleiding van het gestelde in het verweerschrift met betrekking tot de Vw 2000 nog op dat ook in deze wet een aantal bepalingen voorkomt, waarbij verblijf in respectievelijk het bezitten van de nationaliteit van een ander land aan de vreemdeling wordt tegengeworpen. Het ligt in de onderhavige situatie, waarin de vreemdelingen noch de nationaliteit van dat andere land bezitten noch daar hebben verbleven, evenwel niet voor de hand om een verblijfsalternatief tegen te werpen. <preserve:para/> <preserve:para> 2.8 Reeds gelet op het vorenstaande kunnen de bestreden beschikkingen naar het oordeel van de rechtbank niet in stand blijven. <preserve:para/> <preserve:para>2.9 Het beroep is derhalve gegrond. <preserve:para/> <preserve:para> 2.10 De rechtbank ziet aanleiding om verweerder te veroordelen tot vergoeding van de door eisers gemaakte proceskosten en het door hen betaalde griffierecht. <preserve:para/> <preserve:para>3 BESLISSING <preserve:para/> <preserve:para>De rechtbank: <preserve:para/> <preserve:para>- verklaart het beroep gegrond; <preserve:para/> <preserve:para> - vernietigt de beschikkingen van 21 november 2000 en bepaalt dat verweerder nieuwe besluiten neemt met inachtneming van deze uitspraak; <preserve:para/> <preserve:para> - wijst de Staat der Nederlanden aan als rechtspersoon om het griffierecht ad ƒ 50,-- aan eisers te vergoeden; <preserve:para/> <preserve:parablock> <preserve:para> - veroordeelt verweerder in de proceskosten van eisers ad ƒ1.420,-- onder aanwijzing van de Staat der Nederlanden als rechtspersoon die deze kosten aan eisers <preserve:para>rechtbank dient te vergoeden. <preserve:para/> <preserve:para> Deze uitspraak is gedaan door mr. G. Blomsma, voorzitter, en mr. A.E.M. Effting-Zeguers en mr. A.A.J. Lemain, rechters, en in het openbaar uitgesproken door mr. G. Blomsma in tegenwoordigheid van mr. W.L.J. Fernhout als griffier op 17 december 2002 <preserve:para/> <preserve:parablock> <preserve:para> Tegen deze uitspraak staat geen gewoon rechtsmiddel open. <preserve:para>Afschrift verzonden: 17 december 2002